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COMPOSITIONS AND METHODS FOR DIAGNOSTICS AND THERAPEUTICS FOR 

HYDROCEPHALUS 

CROSS REFERENCE TO RELATED APPLICATIONS 

5 Priority is claimed to US provisional patent application no. 60/374,184 filed April 19, 2002, 

and to US provisional patent application no. 60/388,266 filed June 13, 2002, both of which are 
incorporated herein by reference. 

FIELD OF THE DISCLOSURE 

10 The present disclosure relates to congenital hydrocephalus, and particularly to a new variant 

protein that is associated with its development. Also disclosed are methods of determining an 
individual's risk of developing disease states and conditions. 

BACKGROUND 

15 Congenital hydrocephalus is a common birth defect that is estimated to occur with a frequency 

of 0.5-1.8 per 1000 births (Howard, etal, J. Med. Genet., 18:252-255 [1981]). It has been estimated 
that about 2/3 of patients with congenital hydrocephalus have some degree of aqueductal stenosis 
(Duckett, S., Pediatric Neuropathology, p. 199 [1995]) which results in an excess of cerebrospinal fluid 
(CSF) in the ventricles of the brain. This excess fluid results in expansion and trauma to the 

20 surrounding brain tissue. Hydrocephalus has significant social and economic costs. In 1993, surgery 
for shunt placement cost almost $100 million per year. Congenital hydrocephalus also has adverse 
effects on the developing brain, which may persist as neurological deficits in children and adults, such 
as mental retardation, cerebral palsy, epilepsy and visual disabilities. 

Many cases of hydrocephalus are caused by chromosome X-linked genetic mutations. Other 

25 causes of congenital and familial congenital hydrocephalus are unknown. Current diagnostic 
procedures are very limited in that they show presence of hydrocephalus after significant 
malformations have occurred. Some of these diagnostic procedures for hydrocephalus include x-ray, 
magnetic resonance imaging (MRI) and CAT scans. 

Regulatory factor X (RFX) members are evolutionary conserved transcription factors that 

30 share a highly conserved winged helix DNA-binding domain. Human RFX4 contains evolutionary 
conserved regions, including a RFX-type DNA-binding domain, a dimerization domain, and other 
conserved regions, and is closely related to RFX1, RFX2, and RFX3 in structure. RFX4 is associated 
with breast cancer, and is expressed in testis. 

In view of these considerations, there is a need for systems and methods for better 

35 understanding, diagnosing, and controlling the complex biological processes that result in congenital 
hydrocephalus. 

SUMMARY OF THE DISCLOSURE 

A new splice variant of RFX4 has been found, and is identified herein as RFX4_v3. It has 
40 surprisingly been determined that this new variant is associated with the development of neurological 
structures, and that its reduction or absence promotes the development of congenital hydrocephalus. 
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This disclosure therefore provides a substantially purified RFX4_v3 polypeptide, and in particular such 
a polypeptide that includes an amino acid sequence at least 70% identical (for example at least 80%, 
85%, 90%, or 95% identical), to the human amino acid sequence set forth as SEQ ID NO: 8, a 
conservative variant of that sequence, or a sequence that is 100% identical to SEQ ED NO: 8. The 
5 polypeptide has RFX4_v3 activity, and the N-terminus of the polypeptide is at least 90% (for example 
at least 95% or 98%) identical to residues 1-14 of the human SEQ ID NO: 8. 

In particular embodiments, the RFX4_v3 polypeptide includes a murine amino acid sequence 
(SEQ ID NO: 6) or a zebrafish sequence (SEQ ID NO: 10), or a sequence having at least 85% identity 
(for example at least 95% or even 100% sequence identity) to SEQ ID NO: 8. 

10 Also provided are isolated nucleic acid molecules encoding the disclosed polypeptides. In 

some embodiments, the nucleic acid molecule includes a nucleic acid sequence at least 70% identical 
(for example at least 80%, 90% or 95% identical) to the human nucleic acid sequence set forth as SEQ 
ID NO: 37. Alternatively, the nucleic acid sequence is at least 80% or 90% (for example at least 95% 
or 98%) identical to the murine sequence SEQ ID NO: 38 or zebrafish sequence SEQ ID NO: 39. 

1 5 The nucleic acid sequence may be operably linked to a heterologous promoter, for example a 

promoter having the sequence shown in SEQ ID NO: 1 1 or SEQ ID NO: 12. The nucleic acid 
molecule may also be included in a vector, and host cells are disclosed that are transformed with the 
vector. Examples of such host cells are a plant cell, an animal cell, or a prokaryotic cell. 

Also provided herein is an isolated nucleic acid molecule that hybridizes under conditions of 

20 low stringency to a target nucleic acid molecule selected from the group consisting of nucleotides 1-42 
of SEQ ID NO: 37, SEQ ID NO: 38, and SEQ ID NO: 39, wherein the isolated nucleic acid molecule is 
at least 15 nucleotides in length. In more particular embodiments, the isolated nucleic acid molecule 
hybridizes under conditions of high stringency to the target nucleic acid molecule, for example a target 
nucleic acid molecule that encodes a RFX4_v3 polypeptide (such as the human SEQ ID NO: 8, the 

25 murine SEQ ID NO: 6, or the zebrafish SEQ ID NO: 10). This isolated nucleic acid sequence can be 
incorporated into a vector, and introduced into a host cell. 

The RFX4_v3 polypeptide inhibits the phenotypic expression of congenital hydrocephalus, 
and has the ability to bind to RFX4_v3 specific antibodies (such as antibodies that distinguish 
RFX4_vl and RFX4_v2 from RFX4_v3). In particular embodiments, the polypeptide includes the 14 

30 consecutive N-terminal amino acid residues of SEQ ID NO: 8, SEQ ID NO: 6, or SEQ ID NO: 1 0, 
which are not found in RFX4_vl or v2. 

Also disclosed are methods for producing a variant of a RFX4_v3 polypeptide, by 
mutagenizing the wild-type nucleic acid sequence of SEQ ID NO: 37, SEQ ID NO: 38, or SEQ ID NO: 
39; and screening the variant for a RFX4_v3 activity. 

35 Compositions are also provided that include a nucleic acid molecule that inhibits the binding 

of the first 42 nucleotides of SEQ ID NO: 37, SEQ ID NO: 38, or SEQ ID NO: 39 to its 
complementary sequence. For example, the nucleic acid molecule is a polynucleotide sequence 
comprising at least fifteen nucleotides capable of hybridizing under stringent conditions to nucleotides 
1-42 ofSEQIDNO:37. 
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Methods are also disclosed for detecting a nucleic acid molecule in a biological sample, 
wherein the nucleic acid molecule encodes a RFX4_y3 polypeptide, by hybridizing a polynucleotide to 
the nucleic acid molecule to produce a hybridization complex, wherein the polynucleotide hybridizes to 
' nucleotides 1-42 of SEQ ID NO: 37, SEQ ID NO: 38, or SEQ ID NO: 39, and detecting the 
5 hybridization complex. The hybridization complex indicates the presence of a polynucleotide encoding 
RFX4_v3 in the biological sample. In particular embodiments, the polynucleotide hybridizes to the 
human sequence, SEQ ID NO: 37. The nucleic acid molecule in the biological sample may be 
amplified prior to hybridizing with the polynucleotide. 

Methods are also provided for identifying a subject at risk of developing RFX4 v3 linked 

10 hydrocephalus, by detecting in the subject an abnormality in a RFX4 v3 polypeptide or in a RFX4 v3 
nucleotide sequence that alters expression of the RFX4 v3. For example, the abnormality may be 
detected by detecting a mutation in a nucleic acid sequence that encodes RFX4 v3, wherein the 
mutation is associated with RFX4 v3 linked hydrocephalus. In one example, the abnormality is 
detected by performing a hybridization analysis with a nucleic acid probe that detects the mutation in 

15 the RFX4 v3 nucleic acid sequence. For example, the method identifies an individual carrying a 

mutated RFX4_v3 allele, by providing from a subject a nucleic acid molecule that includes a RFX4_v3 
allele. A mutation is then detected in the RFX4_v3 allele that results in phenotypic expression of 
congenital hydrocephalus. 

In alternative embodiments, the abnormality is detected in the RFX4_v3 polypeptide. For 

20 example, a reduced expression of the RFX4 v3 polypeptide is detected, or a mutation is detected in 
RFX4_jv3 that results in phenotypic expression of congenital hydrocephalus. In certain examples the 
mutations are detected with an antibody (such as a monoclonal antibody) that specifically binds to the 
RFX4_v3 polypeptide. 

To perform these detection methods, a biological sample is obtained from the subject, in 

25 which the abnormality in the RFX4_v3 polypeptide or in the RFX4_v3 nucleotide sequence is detected. 
Specific examples of the biological sample include blood, amniotic fluid, plasma, a biopsy specimen, 
or cerebral spinal fluid. 

A kit may also be used for determining if a subject is a carrier of a mutated RFX4_v3 gene 
that is associated with congenital hydrocephalus. Such a kit may include 

30 a reagent that specifically detects a mutation in a RFX4_y3 allele, accompanied by instructions for 
determining whether the subject is at increased risk of expressing congenital hydrocephalus if the 
reagent specifically detects the mutation. Specific examples of the detection reagent are a nucleic acid 
probe that hybridizes under stringent conditions to the nucleic acid sequence of SEQ ID NO: 37, SEQ 
ID NO: 38 or SEQ ID NO: 39, or an antibody that specifically binds the protein expressed by the 

35 RFX4_v3 allele. 

Antibodies specific for an RFX4_v3 polypeptide may be obtained by injecting an animal with 
RFX4_v3 polypeptides or an immunogenic portion thereof, and preparing a hybridoma that expresses 
the monoclonal antibody. The RFX4_v3 specific antibody may be used for detection of RFX4_v3 
polypeptides, or as a therapeutic agent 
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This disclosure also provides a transgenic mouse having somatic and germ cells that include a 
disrupted endogenous RFX4_v3 gene, wherein the disruption is sufficient to produce an increased 
susceptibility to developing congenital hydrocephalus. The disrupted gene is, for example, introduced 
into an ancestor of the mouse at an embryonic stage. In certain embodiments the mouse, if 
5 homozygous for the disrupted gene, does not reproduce. A particular example of the disruption is an 
insertion within the RFX4_y3 gene, or a deletion or substitution within the RFX4_v3 gene. 

Also disclosed herein are methods of making a non-human transgenic animal with a knockout 
for the RFX4_v3 gene, by disrupting an RFX4_v3 transcript, the disruption being sufficient to produce 
hydrocephalus in the transgenic animal, such as a mouse. Disrupting the RFX4_v3 transcript may 

10 include, for example, deleting or substituting any portion of the RFX4_v3 transcript, inserting an 

exogenous gene into the RFX4_v3 transcript, or any combination thereof. The transgenic mice may be 
crossed with each other to produce other transgenic animals having a similar phenotype. 

Compounds may be screened for the ability to alter RFX4_v3 activity, by providing a first 
polypeptide sequence comprising at least a portion of RFX4_v3, a second polypeptide sequence 

15 comprising at least a portion of a protein known to interact with RFX4_v3, and one or more test 

compounds. The polypeptide sequences are combined with each other and exposed to one or more test 
compounds under conditions such that the first polypeptide sequence, the second polypeptide sequence, 
and the test compound interact. The presence or absence of an interaction between the polypeptide 
sequences is then determined to detect a test compound that alters RFX4_v3 activity. 

20 The present disclosure also provides a composition, such as a pharmaceutical composition, 

that includes the polypeptide. For example, the composition is a therapeutic composition that includes 
a therapeutically effective amount of the polypeptide. This disclosure also enables the treatment of 
congenital hydrocephalus, for example by administering a pharmaceutical composition that includes a 
therapeutically effective amount of an RFX4_v3 nucleic acid, an RFX4_v3 polypeptide, or a 

25 therapeutically effective variant or portion of either. Hydrocephalus can also be treated by 

administering to the subject a therapeutically effective amount of an agent that increases presence of a 
RFX4_v3 polypeptide in the brain of the subject. Examples of this therapeutic approach are 
administering exogenous RFX4_v3 polypeptide to the subject, increasing expression of RFX4_v3 
polypeptide in the subject, or introducing into the subject a vector that expresses the RFX4_v3 

30 polypeptide in the brain of the subject. 

The foregoing and other features and advantages will become more apparent from the 
following detailed description of a several embodiments. 

BRIEF DESCRIPTION OF THE FIGURES 
35 Figure 1 shows the alignment of mouse RFX4_v3 sequences with human chromosome 12 

genomic clone NT_009720 (SEQ ID NO: 36). 

Figure 2 shows a schematic representation of 200 kb of human genomic sequence from 
NT_009720.8, shown in reverse complement orientation, and the position within this sequence of the 
exons that comprise the three indicated RFX4 transcripts. At the top of the figure is shown the 
40 transcript corresponding to RFX4_v2 (accession number NM_002920). Exon 1 in this transcript is 
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unique to this transcript; exons 2-5 are shared with the novel RFX4_y3 transcript described herein; 



10 



15 



20 



25 



30 



35 



exons 6-1 5 A are shared with the RFX4_v3 transcript as well as the transcript RFX4_vl; and exon 15B 
is apparently unique to this transcript, and contains a polyadenylation sequence and presumably a 
polyA tail as indicated by the wavy line. The location of these exons on the genomic sequence are 
indicated. Below the genomic sequence is represented the transcript RFX4_vl. It contains a unique 
exon 1; exons 2-1 1 shared with both RFX4jv2 and RFX4_jv3; and exons 12-14 shared only with 
RFX4_v3. The RFX4_v3 transcript contains a unique exon 1; exons 2-5 shared only with RFX4_v2; 
exons 6-15 shared with both RFX4_vl and RFX4_v2; and exons 16-18 shared only with RFX4_vl. 
The site of transgene insertion is indicated in the genomic clone by the black X in the intron between 
exons 13 and 14 of RFX4_vl; its position between exons 17 and 18 of RFX4_v3 is also indicated. The 
portions of the RFX4 v3 transcript coding for the 737 amino acid human RFX4_v3 protein are 
indicated, as is the protein's DNA binding domain (DBD). 

Figure 3 shows the nucleic acid sequence alignment of human and mouse proximal promoters 
for RFX4jv3 (residues 3794-4000 of SEQ ID NO: 1 1 and residues 1-207 of SEQ ID NO: 12, 
respectively). 

Figure 4 shows the amino acid sequence alignment of human, mouse and zebrafish RFX4__v3 
at the amino terminal end (residues 1-178 of SEQ ID NO: 8, residues 1-180 of SEQ ID NO: 6, and 
residues 1-158 of SEQ ID NO: 10, respectively). 

Figure 5 shows the amino acid sequence alignment of human and murine RFX4_v3 (SEQ ID 
NOs: 8 and 6, respectively). 

Figure 6 shows a schematic alignment of mouse, human and zebrafish RFX4_v3 amino acid 
sequences (SEQ ID NOs: 8, 6 and 10, respectively). The predicted amino acid sequences from these 
three RFX4_v3 orthologues were aligned using ClustalW. The position of the characteristic RFX DNA 
binding domain (DBD) is indicated by the box; other boxes contain the B and C boxes and the 
dimerization domain (DD). The shaded first 14 amino acids labeled exon 1 were unique to RFX4_v3 
(human); the next unshaded sequences represent exons 2-5 and are identical to sequences from 
RFX4_v2; the next shaded sequences represent exons 6-15 and are identical to sequences from both 
RFX4_vl and RFX4_v2; and the next unshaded sequences represent exons 16-18 and are identical to 
sequences in RFX4_vl. Asterisks indicate amino acid identity; double dots indicate a high degree of 
amino acid similarity; and single dots indicate less similarity. 

Figure 7 is a set of digital images showing hydrocephalus in adult TG mice. Figure 7A shows 
two mice in lateral (top) and frontal (bottom) view at about two months of age, showing the 
characteristic domed head and lateral displacement of the ears in the transgenic (TG) mouse compared 
to its wild-type (WT) littermate. Figure 7B shows parasagittal sections, stained with hematoxylin and 
eosin, of brains from four littermate mice, three TG and one WT, at about seven weeks of age. The 
marked dilatation of die lateral ventricles (LV) is obvious in the TG mice; however, there is no 
evidence for dilatation of the fourth ventricles (arrows). Bar = 1mm. 

Figure 8 is a set of digital images showing hydrocephalus in newborn TG mice. Serial rostral 
(R) to caudal (C) coronal sections, stained with hematoxylin and eosin, from newborn (P0.5) TG and 
WT littermates are shown, with each pair of sections representing approximately the same coronal 
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plane. Note the extreme hydrocephalus apparent in the olfactory ventricles (OV) and the lateral 
ventricles (LV) of the TG compared to the WT mouse. In the more posterior sections, note the similar 
appearance of the aqueduct of Sylvius (Aq) and the fourth ventricle (FV) in the WT and TG mice. 

Figure 9 is a set of digital images showing the aqueduct of Sylvius and SCO in WT and TG 
5 mice. Figure 9A shows coronal sections in a rostral (R) to caudal (C) direction from P0.5 WT and TG 
littermates stained with hematoxylin and eosin, demonstrating the apparent absence of the SCO in the 
TG mouse. Figure 9B shows similar sections stained with an antibody to Reissner's fibers. Note the 
near-absence of antibody staining in the TG section (top) compared to the WT section (bottom). The 
arrow in the top section indicates a small amount of antibody staining in one section from the KO 
10 mouse, indicating the presence of the Reissner's fiber antigen. The counterstain was hematoxylin; the 
bar in the bottom section in Figure 9B represents 50 um, and the top section was further magnified 2.5 
times. 

Figure 10 is a set of digital images identifying the transgene insertion site. Figure 10A shows 
a Southern blot of genomic DNA from WT and TG mice, digested with the three restriction enzymes 

15 indicated and probed with a 3 '-insertion site-specific probe. The arrows indicate the three single, novel 
bands hybridizing to the probe in the DNA from the TG mice, indicating the likelihood of a single 
transgene insertion site. Figure 10B shows a PCR-based analysis of genomic DNA from one litter of 
interbred TG mice, indicating the PCR products that were specific for the presence of the transgene 
(Transgene-specific) and those that were specific for the endogenous sequence that was interrupted by 

20 the transgene (Insertion site-specific). The transgene specific primers were 5- 
AGCCAGTAATAAGAACTGCAGA — 3 ! (SEQ ID NO: 29) and 5' - 

GGCACTCTTAGCAAACCTCAGG -3' (SEQ ID NO: 30), which correspond to bp 264-285 of the 
human cytochrome P450 cDNA clone accession number NM_000775.2 and bp 5225-5246 of the 
mouse a-myosin heavy chain promoter clone accession number MMU71441, respectively. The 

25 insertion site specific primers were 5 f -CATGGAAAGGGCAGAGTGAGC-3 1 (SEQ ID NO: 3 1) and 
S'-GGCCATTGTCACCACTCGTAA^' (SEQ ID NO: 32), which correspond to bp 732-752 and bp 
323-343 of mouse trace archive sequence gnl|ti|9191 1671, respectively. In both cases, the results were 
confirmed by PCR using different pairs of primers. The DNA is characterized as +/+, +/- and -/- by the 
presence of the interrupted allele. Figure 10C shows a northern blot of total brain RNA from newborn 

30 mice of the +/+, +/- and -/- genotypes. This blot was probed with a mouse EST clone that was 94 % 
identical over 284 bases to a region corresponding to the 3'-end of the human testis-specific RFX4 
transcript H10145. The only visible transcript was of approximately 4 kb (RFX4_y3); this was 
decreased in expression in the +/- sample, and undetectable in the -/- sample. Longer exposure of the 
blot did not reveal the presence of any truncated mRNA species in the +/- and -/- lanes. The same blot 

35 was hybridized to an actin cDNA (lower panel), and demonstrates roughly equivalent loading of the 
three RNA samples. In D is shown the hybridization of the same probe to adult mouse tissues, 
revealing an approximately 4 kb transcript in brain (RFX4_v3), a 3.7 kb transcript in testis, and a still 
smaller transcript in liver. In E is shown the pattern of developmental expression of the 4 kb transcript, 
which was undetectable in whole embryos at E7.5, highly expressed in whole embryos at E9.5 and 

40 10.5, and less well expressed at E13.5 and 14.5. The brain, liver and testis lanes from D are juxtaposed 
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in E to illustrate the difference in size between the brain (RFX4_v3), liver and testis transcripts, and the 
size identity of the adult brain transcript and the embryonic transcript. Also shown is the expression of 
a control mRNA for cyclophillin (Cyclo.) 

Figure 11 is a digital image of a Northern analysis of RFX4 transcript expression using 

5 transcript-specific probes. cDNA probes corresponding to multiple or single RFX4 transcript variants, 
as indicated on the bottom of the figure, were used to probe northern blots containing total cellular 
RNA from adult testes (T), liver (L) and brain (B), or from brains of El 8 mice of the +/+, +/- and -/- 
genotypes, as indicated. The blots were aligned to demonstrate the positions of the three hybridizing 
RFX4 species vl, v2 and v3 (arrows), as well as an uncharacterized transcript seen in adult mouse 

10 liver. There was no detectable hybridization of the specific vl and v2 probes to the El 8 brain RNA of 
any genotype. 

Figure 12 is a set of digital images showing the developmental expression of RFX4_v3. 
Figures 12A-E are digital images of wholemount embryos at the indicated embryonic days (E) in which 
the RFX4__v3 transcript is indicated by the blue digoxigenin staining. For Figures 12A and B, the 

15 abbreviations are: mb, midbrain; fb, forebrain; hb, hindbrain. In Figure 12C, the wholemount suggests 
minimal staining rostral of the zona limitans (zl); however, a section through the plane indicated as C 
shows staining of the dorsal cortex (cx). Other abbreviations in Figure 12C are: te, telencephalon; me, 
mesencephalon; rb, rhombencephalon; sc, spinal cord. Figures 12D and E are digital images of 
wholemounts at El 0.5, whereas Figure 12F is a digital image of a midline sagittal section, and Figures 

20 12G-I are digital images of coronal sections, through similar embryos. New abbreviations in Figures 
12D-I are: di, diencephalon; cb, cerebellum; cp/lt, commissural plate/lamina terminalis; LGE, lateral 
ganglionic eminence; MGE, median ganglionic eminence; ch, choroid plexus; R, retina; os, optic stalk; 
DT, dorsal thalamus; VT, ventral thalamus; HY, hypothalamus; V, trigeminal ganglion; VII/VIII, 
facial/vestibular gangion. The arrowheads in Figures 12F-H indicate the lost expression in the 

25 . telencephalic dorsal midline at E10.5. J-M indicate one sagittal (J) and three caudal to rostral coronal 
sections through the head at E12.5. Note the lack of staining in the telencephalic dorsal midline 
(arrowheads in J, K), in the epiphysis (ep) in L, and in the fourth ventricle choroid plexus (ch) in M. 
Scale bars for (A-M), 500 urn. 

Figure 13 is a set of digital images showing RFX4_v3 in situ staining in the region of the 

30 developing SCO. Figures 13A-D indicate progressively rostral to caudal sections through the brain of 
a normal embryo at E16.5. Abbreviations are the same as in the legend to Figure 12 except for me 
(mesencephalon), cb (cerebellum), and P (pituitary). The box labeled F in section C contains the SCO 
and the aqueduct of Sylvius; this is shown enlarged in F at E16.5. The same region is shown at E14.5 
(E) and at the time of birth (P0) (G). Note the high level expression of the RFX4__v3 transcript in the 

35 region of the developing SCO in E, and in the SCO itself in F and G. Scale bars for (A-D), 500 um; 
(E-G), 100 jxm. 

Figure 14 is a set of digital images showing the head morphology from -/- mice at El 2.5. 
Figure 14A shows heads from two El 2.5 littermates after fixation, one hemizygous (HE) and one KO 
(-/-) as indicated. Note the near normal appearance of the eyes and the fecial structures, but the clearly 
40 abnormal doming of the skulls and the smaller heads of the -A littermate. Figure 14B shows coronal 
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sections from WT (top row) and KO (bottow row) littermate mice at E12.5, stained with hematoxylin 
and eosin. In the most rostral (R) sections (left panels), the brains appear somewhat similar, showing 
both lateral ventricles (LV) and apparently normal midline structures, although the brains were 
somewhat smaller in the KO mice. In more caudal (Figure 14C) sections (middle two panels), 
5 however, there was a striking loss of midline structures and the formation of a single central ventricle. 
In still more caudal sections (right panels), taken at the level of the retinas, there were continued 
striking abnormalities and loss of essentially all dorsal midline structures. Other abbreviations: MF, 
interhemispheric fissure; Cing. cortex, cingulate cortex; Gang, em., ganglionic eminence; PC, posterior 
commissure; Epithal., epithalamus; Hip., hippocampus; Hypothal., hypothalamus. 

10 Figure 15 is a set of digital images showing expression of molecular markers in WT and 

littermate KO mice at E12.5. Shown are the in situ hybridization staining patterns of sagittal (Figures 
15A-C) and coronal sections through WT (+/- or +/+) and KO (-/-) heads at E12.5. The digoxigenin 
staining indicates the presence of the specific transcript being evaluated. New abbreviations not found 
in the legends of Figures 12-14 include: se, septum; IN, infundibulum; It, lamina terminalis; is, istmus; 

15 hem (cortical hem). Note that FgfS expression is maintained in the istmus (is), infundibulum (IN), 

lamina terminalis (It) and septum (se), but is lost in the choroid plexus (ch) of the forebrain (C and C). 
The asterisks in Figs D 9 and E' indicate the decrease in Msx2 expression (D, D s ) and the lack of WntSa 
expression (E, E') in the dorsal midline of the KO embryos. Scale bars for (A-I), 500 Jim. 

20 BRIEF DESCRIPTION OF THE SEQUENCE LISTINGS 

The nucleic acid and protein sequences listed in the accompanying sequence listing is shown 
using standard letter abbreviations for nucleotide bases, and triple letter code for amino acids, as 
defined in 37 C.F.R. 1.822. Only one strand of each nucleic acid sequence is shown, but the 
complementary strand is understood as included by any reference to the displayed strand. In the 
25 accompanying sequence listing: 

SEQ ID NO: 1 shows the nucleic acid sequence of human RFX4_v2 (GenBank Accession 
NO.NM 002920). 

SEQ ID NO: 2 shows the amino acid sequence of human RFX4_v2 (GenBank Accession 
No.:NP_002911.2). 

30 SEQ ID NO: 3 shows the nucleic acid sequence of human RFX4_vl (GenBank Accession 

No. AF332192). 

SEQ ID NO: 4 shows the amino acid sequence of human RFX4_vl (GenBank Accession 
No.: AAK17191.1). 

SEQ ID NO: 5 shows a nucleic acid sequence of murine RFX4jv3 (GenBank Accession No. 
35 AY102010), including untranslated sequences. 

SEQ ID NO: 6 shows the amino acid sequence of murine RFX4_v3. 

SEQ ID NO: 7 shows a nucleic acid sequence of human RFX4_v3 (GenBank Accession No. 
AY1 02009), including untranslated sequences. 

SEQ ID NO: 8 shows the amino acid sequence of human RFX4_v3. 
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SEQ ID NO: 9 shows a nucleic acid sequence of zebrafish RFX4_v3 (GenBank Accession 
No. AY10201 1), including untranslated sequences. 

SEQ ID NO: 10 shows the amino acid sequence of zebrafish RFX4_v3. 
SEQ ID NO: 11 shows the nucleic acid sequence of the proximal promoter of human 
5 RFX4v3. 

SEQ ID NO: 12 shows the nucleic acid sequence of the proximal promoter of murine 
RFX4_v3. 

SEQ ID NO: 13 shows the N-terminal amino acid sequence of zebrafish RFX4_v3. 

SEQ ID NO: 14 shows a nucleic acid sequence of human RFX4_vl (GenBank Accession No. 

1 0 NM_03249 1 ), including untranslated sequences. 

SEQ ID NO: 15 shows the amino acid sequence of human RFX4_vl (GenBank Accession 
No. NPJ 15880). 

SEQ ID NO: 16 shows the first forward primer for RFX4_vl . 
15 SEQ ID NO: 17 shows the first reverse primer for RFX4_vl . 

SEQ ID NO: 18 shows the shows the second forward primer for RFX4_vl. 

SEQ ID NO: 19 shows the shows the second reverse primer for RFX4_vl. 

SEQ JD NO: 20 shows the forward primer for mouse RFX4_v3. 

SEQ ID NO: 21 shows the reverse primer for mouse RFX4_v3. 
20 SEQ ID NO: 22 shows the first forward primer for human RFX4_v2. 

SEQ ID NO: 23 shows the reverse primer for human RFX4_v2. 

SEQ ID NO: 24 shows the second forward primer for human RFX4_v2. 

SEQ ID NO: 25 shows the first forward nested BSIRFRX4-specific primer. 

SEQ ID NO: 26 shows the first reverse nested BSIRFRX4-specific primer. 
25 SEQ ID NO: 27 shows the second forward nested BSIRPRX4-specific primer. 

SEQ ID NO: 28 shows the second reverse nested BSIRFRX4-specific primer. 

SEQ ID NO: 29 shows the transgene-specific forward primer. 

SEQ ID NO: 30 shows the transgene-specific reverse primer. 

SEQ ID NO: 31 shows the insertion site specific forward primer. 
30 SEQ ID NO: 32 shows the insertion site specific reverse primer. 

SEQ ID NO: 33 shows amino acids 1-14 from the N-terminus of human RFX4_v3. 

SEQ ID NO: 34 shows amino acids 1-14 from the N-terminus of murine RFX4_v3. 

SEQ ID NO: 35 shows amino acids 1-14 from the N-terminus of zebrafish RFX4_v3. 

SEQ ID NO: 36 shows a portion of human chromosome 12 genomic clone NT-009720. 
35 SEQ ID NO: 37 shows the nucleic acid coding sequence that encodes the human RXF4_v3 

amino acid sequence shown in SEQ ID NO: 8. 

SEQ ID NO: 38 shows the nucleic acid coding sequence that encodes the murine RXF4_v3 
amino acid sequence shown in SEQ ID NO: 6. 

SEQ ID NO: 39 shows the nucleic acid coding sequence that encodes the zebrafish RXF4_v3 
40 amino acid sequence shown in SEQ ID NO: 10. 
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TERMS 



Unless otherwise noted, technical terms are used according to conventional usage. Definitions 
of common terms in molecular biology may be found in Benjamin Lewin, Genes V, published by 
Oxford University Press, 1994 (ISBN 0-19-854287-9); Kendrew et al (eds.), The Encyclopedia of 
Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0-632-02182-9); and Robert A. 
Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by 
VCH Publishers, Inc., 1995 (ISBN 1-56081-569-8). 

In order to facilitate review of the various embodiments, the following explanations of specific 
terms are provided: 

The term "gene" refers to a nucleic acid (eg., DNA) sequence that comprises coding 
sequences necessary for the production of a polypeptide or precursor (e.g., RFX4_v3). The 
polypeptide can be encoded by a full length coding sequence or by any portion of the coding sequence 
so long as the desired activity or functional properties (e.g., enzymatic activity, ligand binding, signal 
transduction, etc.) of the full-length or fragment are retained. The term also encompasses the coding 
region of a structural gene and includes sequences located adjacent to the coding region on both the 5 9 
and 3' ends for a distance of about 1 kb on either end such that the gene corresponds to the length of 
the full-length mRNA. The sequences that are located 5* of the coding region and which are present in 
the mRNA are referred to as 5' untranslated sequences. The sequences that are located 3' or 
downstream of the coding region and that are present in the mRNA are referred to as 3' untranslated 
sequences. The term "gene" encompasses both cDNA and genomic forms of a gene. A genomic form 
or clone of a gene contains the coding region interrupted with non-coding sequences termed "introns" 
or "intervening regions" or "intervening sequences." Introns are segments of a gene that are 
transcribed into nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. 
Introns are removed or "spliced out" from the nuclear or primary transcript; introns, therefore, are 
absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify 
the sequence or order of amino acids in a nascent polypeptide. 

In particular, the term "RFX4_v3 gene" refers to the full-length RFX4_v3 nucleotide 
sequence (e.g., nucleotides 2,829,445 to 2,991,076 of Accession no. NTJB5235, Human 
Chromosome 12 Genomic Contig; or nucleotides 2,737,642 to 2,889,558 of Accession no. NT_039498, 
Mouse Chromosome 10 Genomic Contig ). However, it is also intended that the term encompass 
fragments of the RFX4_v3 sequence, as well as other domains within the full-length RFX4_v3 
nucleotide sequence. Furthermore, the terms "RFX4_v3 nucleotide sequence" or "RFX4_v3 
polynucleotide sequence" encompasses DNA, cDNA, and RNA (e.g, mRNA) sequences. 

Where amino acid sequence is recited herein to refer to an amino acid sequence of a naturally 
occurring protein molecule, amino acid sequence and like terms, such as polypeptide or protein are not 
meant to limit the amino acid sequence to the complete, native amino acid sequence associated with the 
recited protein molecule. 

As used herein, a RFX4_v3 polypeptide is an amino acid sequence, for example, SEQ ID NO: 
6, SEQ ID NO: 8, or SEQ ID NO: 10, or a variant amino acid sequence with substantial sequence 
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identity to SEQ ID NO: 6, SEQ ID NO: 8, or SEQ ID NO: 10, for example 70%, 75%, 80%, 85%, 
90%, 95%, or 98% sequence identity. In some embodiments, the RFX4_v3 polypeptide retains at least 
one RFX4_v3 activity. As used herein, a RFX4_v3 activity is an activity that promotes the 
development of the brain's ventricular system, the absence of which activity is demonstrated by the 
5 development of hydrocephalus. In one embodiment, the RFX4_v3 activity is the inhibition of the 
phenotypic expression of congenital hydrocephalus. In another example, the RFX4_v3 activity is the 
ability to bind to RFX4_v3 specific antibodies. Screening for a RFX4_v3 activity can be accomplished 
by, for example, screening for the morphological or behavioral signs of hydrocephalus, or screening for 
binding to RFX4_v3 antibodies (see below). 

10 As used herein, "abnormal" refers to a difference from wild-type, particularly a difference that 

results in expression of a protein that is associated with a disease condition. For example, "abnormal 
expression" refers to a perturbation in the level at which a particular protein is expressed, for example 
an increase or decrese in expression as compared to a wild-type level of expression. An "abnormal 
RFX4 v3 polypeptide" refers to such a difference in either the protein itself, the level or its expression, 

15 or a difference in the nucleic acid that encodes the protein and results in the abnormality. 

In addition to containing introns, genomic forms of a gene may also include sequences located 
on both the 5' and 3' end of the sequences that are present on the RNA transcript. These sequences are 
referred to as "flanking" sequences or regions (these flanking sequences are located 5' or 3' to the non- 
translated sequences present on the mRNA transcript). The 5' flanking region may contain regulatory 

20 sequences such as promoters and enhancers that control or influence the transcription of the gene. The 
3' flanking region may contain sequences that direct the termination of transcription, 
post-transcriptional cleavage and polyadenylation. 

The term e *wild-type" refers to a gene or gene product that has the characteristics of that gene 
or gene product when isolated from a naturally occurring source. A wild-type gene is that which is 

25 most frequently observed in a population and is thus arbitrarily designed the "normal" or "wild-type" 
form of the gene. In contrast, the terms "modified," "mutant," and "variant" refer to a gene or gene 
product that displays modifications in sequence and or functional properties (i.e., altered 
characteristics) when compared to the wild-type gene or gene product. It is noted that naturally- 
occurring mutants can be isolated; these are identified by the fact that they have altered characteristics 

30 when compared to the wild-type gene or gene product. 

As used herein, the term "heterozygous" refers to having different alleles at a corresponding 
chromosomal locus. 

As used herein, the term "homozygous" refers to having similar alleles at a corresponding 
chromosomal locus. 

35 As used herein, the terms "nucleic acid molecule encoding," "DNA sequence encoding," and 

"DNA encoding" refer to the order or sequence of deoxyribonucleotides along a strand of 
deoxyribonucleic acid. The order of these deoxyribonucleotides determines the order of amino acids 
along the polypeptide (protein) chain. The DNA sequence thus codes for the amino acid sequence. 

DNA molecules are said to have "5' ends" and "3' ends" because mononucleotides are reacted 

40 to make oligonucleotides or polynucleotides in a manner such that the 5* phosphate of one 



11 



WO 03/088919 




CT/US03/12348 



mononucleotide pentose ring is attached to the 3* oxygen of its neighbor in one direction via a 
phosphodiester linkage. Therefore, an end of an oligonucleotides or polynucleotide, referred to as the 
"5' end" if its 5' phosphate is not linked to the 3' oxygen of a mononucleotide pentose ring and as the 
"3' end" if its 3' oxygen is not linked to a 5' phosphate of a subsequent mononucleotide pentose ring. 
5 As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide or polynucleotide, 
also may be said to have 5* and V ends. In either a linear or circular DNA molecule, discrete elements 
are referred to as being "upstream" or 5' of the "downstream" or 3' elements. This terminology 
reflects the fact that transcription proceeds in a 5' to 3' fashion along the DNA strand. The promoter 
and enhancer elements that direct transcription of a linked gene are generally located 5' or upstream of 
10 the coding region. However, enhancer elements can exert their effect even when located 3' of the 
promoter element and the coding region. Transcription termination and polyadenylation signals are 
located 3' or downstream of the coding region. 

As used herein, the terms "an oligonucleotide having a nucleotide sequence encoding a gene" 
and "polynucleotide having a nucleotide sequence encoding a gene," means a nucleic acid sequence 
15 comprising the coding region of a gene or, in other words, the nucleic acid sequence that encodes a 
gene product. The coding region may be present in either a cDNA, genomic DNA, or RNA form. 
When present in a DNA form, the oligonucleotide or polynucleotide may be single-stranded (i.e., the 
sense strand) or double-stranded. Suitable control elements such as enhancers/promoters, splice 
junctions, polyadenylation signals, etc. may be placed in close proximity to the coding region of the 
20 gene if needed to permit proper initiation of transcription and/or correct processing of the primary RNA 
transcript. Alternatively, the coding region utilized in the expression vectors of the present disclosure 
may contain endogenous enhancers/promoters, splice junctions, intervening sequences, 
polyadenylation signals, etc. or a combination of both endogenous and exogenous control elements. 

As used herein, the term "regulatory element" refers to a genetic element that controls some 
25 aspect of the expression of nucleic acid sequences. For example, a promoter is a regulatory element 
that facilitates the initiation of transcription of an operably linked coding region. Other regulatory 
elements include splicing signals, polyadenylation signals, termination signals, etc. 

As used herein, the terms "complementary" or "complementarity" are used in reference to 
polynucleotides (i.e., a sequence of nucleotides) related by the base-pairing rules. For example, for the 
30 sequence "5'-A-G-T-3\" is complementary to the sequence "3'-T-C-A-5\" Complementarity may be 
"partial," in which only some of the nucleic acids' bases are matched according to the base pairing 
rules. Or, there may be "complete" or "total" complementarity between the nucleic acids. The degree 
of complementarity between nucleic acid strands has significant effects on the efficiency and strength 
of hybridization between nucleic acid strands. This is of particular importance in amplification 
35 reactions, as well as detection methods that depend upon binding between nucleic acids. 

The term "homology" refers to a degree of complementarity. There may be partial homology 
or complete homology (i.e., identity). A partially complementary sequence is one that at least partially 
inhibits a completely complementary sequence from hybridizing to a target nucleic acid and is referred 
to using the functional term "substantially homologous." The term "inhibition of binding," when used 
40 in reference to nucleic acid binding, refers to inhibition of binding caused by competition of 
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homologous sequences for binding to a target sequence. The inhibition of hybridization of the 
completely complementary sequence to the target sequence may be examined using a hybridization 
assay (Southern or Northern blot, solution hybridization and the like) under conditions of low 
stringency. A substantially homologous sequence or probe will compete for and inhibit the binding 
5 (i. e.> the hybridization) of a completely homologous sequence to a target under conditions of low 
stringency. This is not to say that conditions of low stringency are such that non-specific binding is 
permitted; low stringency conditions require that the binding of two sequences to one another be a 
specific (i.e., selective) interaction. The absence of non-specific binding may be tested by the use of a 
second target that lacks even a partial degree of complementarity (e.g., less than about 30% identity); in 
10 the absence of non-specific binding the probe will not hybridize to the second non-complementary 
target. 

Those of skill in the art know that numerous equivalent conditions may be employed to 
comprise low stringency conditions; factors such as the length and nature (DNA, RNA, base 
composition) of the probe and nature of the target (DNA, RNA, base composition, present in solution 

15 or immobilized, etc.) and the concentration of the salts and other components (e.g., the presence or 
absence of formamide, dextran sulfate, polyethylene glycol) are considered and the hybridization 
solution may be varied to generate conditions of low stringency hybridization different from, but 
equivalent to, the above listed conditions. In addition, those of skill in the art know conditions that 
promote hybridization under conditions of high stringency {e.g,, increasing the temperature of the 

20 hybridization and/or wash steps, the use of formamide in the hybridization solution, etc.) 

When used in reference to a double-stranded nucleic acid sequence such as a cDNA or 
genomic clone, the term "substantially homologous" refers to any probe that can hybridize to either or 
both strands of the double-stranded nucleic acid sequence under conditions of low stringency as 
described above. 

25 A gene may produce multiple RNA species that are generated by differential splicing of the 

primary RNA transcript. cDNAs that are splice variants of the same gene will contain regions of 
sequence identity or complete homology (representing the presence of the same exon or portion of the 
same exon on both cDNAs) and regions of complete non-identity (for example, representing the 
presence of exon "A" on cDNA 1 wherein cDNA 2 contains exon "B" instead). Because the two 

30 cDNAs contain regions of sequence identity they will both hybridize to a probe derived from the entire 
gene or portions of the gene containing sequences found on both cDNAs; the two splice variants are 
therefore substantially homologous to such a probe and to each other. 

When used in reference to a single-stranded nucleic acid sequence, the term "substantially 
homologous" refers to any probe that can hybridize (j.e., it is the complement of) the single-stranded 

35 nucleic acid sequence under conditions of low stringency as described above. 

As used herein, a specific binding agent is an agent that binds substantially only to a defined 
target. Thus a RFX4_v3 -specific binding agent binds substantially only the RFX4_v3 RNA or DNA 
sequence, or the RFX4_v3 polypeptide. As used herein, the phrase RFX4_v3 -specific binding agent 
includes anti- RFX4_v3 protein antibodies and other agents (such as nucleic acids) that bind 
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substantially only to the RFX4_v3 nucleic acid sequence or polypeptide. As used herein, "specific 
binding" includes specific hybridization. 

As used herein, the term "competes for binding" is used in reference to a first polypeptide 
with an activity which binds to the same substrate as does a second polypeptide with an activity, where 
5 the second polypeptide is a variant of the first polypeptide or a related or dissimilar polypeptide. The 
efficiency (e.g, kinetics or thermodynamics) of binding by the first polypeptide may be the same as or 
greater than or less than the efficiency substrate binding by the second polypeptide. For example, the 
equilibrium binding constant (K^ for binding to the substrate may be different for the two 

polypeptides. The term "K m " as used herein refers to the Michaelis-Menton constant for an enzyme 

10 and is defined as the concentration of the specific substrate at which a given enzyme yields one-half its 
maximum velocity in an enzyme catalyzed reaction. 

As used herein, the term "hybridization" is used in reference to the pairing of complementary 
nucleic acids. Hybridization and the strength of hybridization (z.e., the strength of the association 
between the nucleic acids) is affected by such factors as the degree of complementarity between the 

15 nucleic acids, stringency of the conditions involved, the T m of the formed hybrid, and the G:C ratio 
within the nucleic acids. 

As used herein, the term "T m " is used in reference to the "melting temperature." The melting 
temperature is the temperature at which a population of double-stranded nucleic acid molecules 
becomes half dissociated into single strands. The equation for calculating the T m of nucleic acids is 

20 well known in the art. As indicated by standard references, a simple estimate of the T m value may be 
calculated by the equation: T m = 81.5 + 0.41 (% G + C), when a nucleic acid is in aqueous solution at 1 
M NaCl (see e.g., Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid 
Hybridization [1985]). Other references include more sophisticated computations that take structural 
as well as sequence characteristics into account for the calculation of T . 

25 As used herein the term "stringency" is used in reference to the conditions of temperature, 

ionic strength, and the presence of other compounds such as organic solvents, under which nucleic acid 
hybridizations are conducted. Those skilled in the art will recognize that "stringency" conditions may 
be altered by varying the parameters just described either individually or in concert. With "high 
stringency" conditions, nucleic acid base pairing will occur only between nucleic acid fragments that 

30 have a high frequency of complementary base sequences (e.g, hybridization under "high stringency" 
conditions may occur between homologs with about 85-100% identity, preferably about 70-100% 
identity). With medium stringency conditions, nucleic acid base pairing will occur between nucleic 
acids with an intermediate frequency of complementary base sequences (e.g., hybridization under 
"medium stringency" conditions may occur between homologs with about 50-70% identity). Thus, 

35 conditions of "weak" or "low" stringency are often required with nucleic acids that are derived from 
organisms that are genetically diverse, as the frequency of complementary sequences is usually less. 

"High stringency conditions" when used in reference to nucleic acid hybridization comprise 
conditions equivalent to binding or hybridization at 42°C in a solution consisting of 5X SSPE (43.8 g/1 
NaCl, 6.9 g/1 NaH 2 P0 4 H 2 0 and 1.85 g/1 EDTA, pH adjusted to 7.4 withNaOH), 0.5% SDS, 5X 
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Denhardt's reagent and 100 ug/ml denatured salmon sperm DNA followed by washing in a solution 
comprising 0.1X SSPE, 1.0% SDS at 42°C when a probe of about 500 nucleotides in length is 
employed. 

"Medium stringency conditions" when used in reference to nucleic acid hybridization 
5 comprise conditions equivalent to binding or hybridization at 42°C in a solution consisting of 5X SSPE 
(43.8 g/1 NaCl, 6.9 g/1 NaH 2 P0 4 H 2 0 and 1.85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 

5X Denhardt's reagent and 100 n-g/ml denatured salmon sperm DNA followed by washing in a solution 
comprising 1 .OX SSPE, 1 .0% SDS at 42°C when a probe of about 500 nucleotides in length is 
employed. 

10 "Low stringency conditions" comprise conditions equivalent to binding or hybridization at 

42°C in a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH 2 P0 4 H 2 0 and 1.85 g/1 EDTA, pH 

adjusted to 7.4 with NaOH), 0.1% SDS, 5X Denhardt's reagent [50X Denhardt's contains per 500 ml: 
5 g Ficoll (Type 400, Pharamcia), 5 g BSA (Fraction V; Sigma)] and 100 g/ml denatured salmon 
sperm DNA followed by washing in a solution comprising 5X SSPE, 0. 1% SDS at 42°C when a probe 

15 of about 500 nucleotides in length is employed. 

The following terms are used to describe the sequence relationships between two or more 
polynucleotides: "reference sequence," "sequence identity," "percentage of sequence identity," and 
"substantial identity." A "reference sequence" is a defined sequence used as a basis for a sequence 
comparison; a reference sequence may be a subset of a larger sequence, for example, as a segment of a 

20 full-length cDNA sequence given in a sequence listing or may comprise a complete gene sequence. 

Generally, a reference sequence is at least 20 nucleotides in length, frequently at least 25 nucleotides in 
length, and often at least 50 nucleotides in length. Since two polynucleotides may each (1) comprise a 
sequence (/.e., a portion of the complete polynucleotide sequence) that is similar between the two 
polynucleotides, and (2) may further comprise a sequence that is divergent between the two 

25 polynucleotides, sequence comparisons between two (or more) polynucleotides are typically performed 
by comparing sequences of the two polynucleotides over a "comparison window" to identify and 
compare local regions of sequence similarity. A "comparison window," as used herein, refers to a 
conceptual segment of at least 20 contiguous nucleotide positions wherein a polynucleotide sequence 
may be compared to a reference sequence of at least 20 contiguous nucleotides and wherein the portion 

30 of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., 
gaps) of 20 percent or less as compared to the reference sequence (which does not comprise additions 
or deletions) for optimal alignment of the two sequences. Optimal alignment of sequences for aligning 
a comparison window may be conducted by the local homology algorithm of Smith and Waterman 
(Smith and Waterman, Adv. Appl. Math., 2: 482 [1981]) by the homology alignment algorithm of 

35 Needleman and Wunsch (Needleman and Wunsch, J. Mol. Biol.,,48:443 [1970]), by the search for 

similarity method of Pearson and Lipman (Pearson and Lipman, Proc. Natl. Acad. Sci. USA, 85:2444 
(1988)], by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and 
TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 
Science Dr., Madison, WI), or by inspection, and the best alignment (i.e. 9 resulting in the highest 

40 percentage of homology over the comparison window) generated by the various methods is selected. 



15 



WO 03/088919 




PCT/US03/12348 



The term "sequence identity" means that two polynucleotide sequences are identical (/.e., on a 
nucleotide-by-nucleotide basis) over the window of comparison. The term "percentage of sequence 
identity" is calculated by comparing two optimally aligned sequences over the window of comparison, 
determining the number of positions at which the identical nucleic acid base {e.g. y A, T, C, G, U, or I) 
5 occurs in both sequences to yield the number of matched positions, dividing the number of matched 
positions by the total number of positions in the window of comparison (z.e., the window size), and 
multiplying the result by 100 to yield the percentage of sequence identity. The terms "substantial 
identity" as used herein denotes a characteristic of a polynucleotide sequence, wherein the 
polynucleotide comprises a sequence that has at least 85 percent sequence identity, preferably at least 

10 90 to 95 percent sequence identity, more usually at least 99 percent sequence identity as compared to a 
reference sequence over a comparison window of at least 20 nucleotide positions, frequently over a 
window of at least 25-50 nucleotides, wherein the percentage of sequence identity is calculated by 
comparing the reference sequence to the polynucleotide sequence which may include deletions or 
additions which total 20 percent or less of the reference sequence over the window of comparison. The 

15 reference sequence may be a subset of a larger sequence, for example, as a segment of the full-length 
sequences of the compositions claimed in the present disclosure {e.g., RFX4_v3). 

As applied to polypeptides, the term "substantial identity" means that two peptide sequences, 
when optimally aligned, such as by the programs GAP or BESTFIT using default gap weights, share at 
least 70 percent sequence identity, at least 80 percent sequence identity, preferably at least 90 percent 

20 sequence identity, more preferably at least 95 percent sequence identity or more {e.g., 99 percent 

sequence identity). Preferably, residue positions which are not identical differ by conservative amino 
acid substitutions. Conservative amino acid substitutions refer to the interchangeability of residues 
having similar side chains. For example, a group of amino acids having aliphatic side chains is 
glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side 

25 chains is serine and threonine; a group of amino acids having amide-containing side chains is 
asparagine and glutamine; a group of amino acids having aromatic side chains is phenylalanine, 
tyrosine, and tryptophan; a group of amino acids having basic side chains is lysine, arginine, and 
histidine; and a group of amino acids having sulfur-containing side chains is cysteine and methionine. 
Preferred conservative amino acids substitution groups are: valine-leucine-isoleucine, phenylalanine- 

30 tyrosine, lysine-arginine, alanine- valine, and asparagine-glutamine. 

As used herein, the term "mutagenize" refers to any method of inducing a mutation in an 
RNA, DNA or amino acid sequence. Methods of mutagenization include, but are not limited to 
chemical mutagenization, for example using bromouracil, nitrous acid, nitrosoguanidine, methyl 
methanesulfonate, ethyl methanesulfonate, acridine orange, proflavin, or ethidium bromide, or by 

35 irradiation, for example ultraviolet irradiation. 

The term "fragment" as used herein refers to a polypeptide that has an amino-terminal and/or 
carboxy-terminal deletion as compared to the native protein, but where the remaining amino acid 
sequence is identical to the corresponding positions in the amino acid sequence deduced from a full- 
length cDNA sequence. Fragments typically are at least 4 amino acids long, preferably at least 20 

40 amino acids long, usually at least 50 amino acids long or longer, and span the portion of the 
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polypeptide required for intermolecular binding of the compositions (claimed in the present disclosure) 
with its various ligands and/or substrates. 

The term "polymorphic locus" is a locus present in a population that shows variation between 
members of the population (i.e., the most common allele has a frequency of less than 0.95). In contrast, 
a "monomorphic locus" is a genetic locus at which little or no variation is seen between members of the 
population (generally taken to be a locus at which the most common allele exceeds a frequency of 0.95 
in the gene pool of the population). 

As used herein, the term "polymorphism information" refers to the presence of absence of one 
or more polymorphisms (e.g., mutations) in a gene (e.g. 9 the RFX4_v3 gene). 

The term "naturally-occurring" as used herein as applied to an object refers to the feet that an 
object can be found in nature. For example, a polypeptide or polynucleotide sequence that is present in 
an organism (including viruses) that can be isolated from a source in nature and which has not been 
intentionally modified by man in the laboratory is naturally-occurring. 

As used herein, "providing a polypeptide from a subject" includes providing any biological 
sample from the subject that includes a polynucleotide. Examples of suitable biological samples 
include samples of any type of tissue, for example brain, liver, lung, stomach, intestine, pancreas, bone, 
skin, spleen, kidney, ovary, testis, or connective tissue, or any body fluid, for example blood, serum, 
plasma, cerebral spinal fluid, tears, sweat, amniotic fluid, semen, urine, gastric and intestinal fluids, 
saliva, mucous, or sinovial fluid. 

"Amplification" is a special case of nucleic acid replication involving template specificity. It 
is to be contrasted with non-specific template replication (i.e, replication that is template-dependent 
but not dependent on a specific template). Template specificity is here distinguished from fidelity of 
replication (/.e., synthesis of the proper polynucleotide sequence) and nucleotide (ribo- or deoxyribo-) 
specificity. Template specificity is frequently described in terms of "target" specificity. Target 
sequences are "targets" in the sense that they are sought to be sorted out from other nucleic acids. 
Amplification techniques have been designed primarily for this sorting out. 

An example of amplification is the polymerase chain reaction (see below). Other examples of 
in vitro amplification techniques include strand displacement amplification (see U.S. Patent No. 
5,744,3 1 1); transcription-free isothermal amplification (see U.S. Patent No. 6,033,881); repair chain 
reaction amplification (see WO 90/01069); ligase chain reaction amplification (see EP-A-320 308); gap 
filling ligase chain reaction amplification (see U.S. Patent No. 5,427,930); coupled ligase detection and 
PGR (see U.S. Patent No. 6,027,889); and NASBA™ RNA transcription-free amplification (see U.S. 
Patent No. 6,025,134). 

Template specificity is achieved in most amplification techniques by the choice of enzyme. 
Amplification enzymes are enzymes that, under conditions they are used, will process only specific 
sequences of nucleic acid in a heterogeneous mixture of nucleic acid. For example, in the case of Q 
replicase, MDV-1 RNA is the specific template for the replicase (D.L. Kacian et al, Proc. Natl. Acad. 
Sci. USA, 69:3038 [1972]). This amplification enzyme will not replicate other nucleic acid. Similarly, 
in the case of T7 RNA polymerase, this amplification enzyme has a stringent specificity for its own 
promoters (Chamberlin et al. 3 Nature, 228:227 [1970]). In the case of T4 DNA ligase, the enzyme will 
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not ligate the two oligonucleotides or polynucleotides, where there is a mismatch between the 
oligonucleotide or polynucleotide substrate and the template at the ligation junction (D.Y. Wu and R. 
B. Wallace, Genomics, 4:560 [1989]). Finally, Tag and Pfu polymerases, by virtue of their ability to 
function at high temperature, are found to display high specificity for the sequences bounded and thus 
5 defined by the primers; the high temperature results in thermodynamic conditions that favor primer 
hybridization with the target sequences and not hybridization with non-target sequences (H. A. Erlich 
(ed.), PCR Technology, Stockton Press [1989]). 

As used herein, the term "amplifiable nucleic acid" is used in reference to nucleic acids that 
may be amplified by any amplification method. It is contemplated that "amplifiable nucleic acid" will 

10 usually comprise "sample template." 

As used herein, the term "sample template" refers to nucleic acid originating from a sample 
that is analyzed for the presence of "target* ' (defined below). In contrast, "background template" is 
used in reference to nucleic acid other than sample template that may or may not be present in a 
sample. Background template is most often inadvertent. It may be the result of carryover, or it may be 

1 5 due to the presence of nucleic acid contaminants sought to be purified away from the sample. For 

example, nucleic acids from organisms other than those to be detected may be present as background in 
a test sample. 

As used herein, the term "primer" refers to an oligonucleotide, whether occurring naturally as 
in a purified restriction digest or produced synthetically, which is capable of acting as a point of 

20 initiation of synthesis when placed under conditions in which synthesis of a primer extension product 
which is complementary to a nucleic acid strand is induced, {i.e., in the presence of nucleotides and an 
inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer is 
preferably single stranded for maximum efficiency in amplification, but may alternatively be double 
stranded. If double stranded, the primer is first treated to separate its strands before being used to 

25 prepare extension products. Preferably, the primer is an oligodeoxyribonucleotide. The primer must 
be sufficiently long to prime the synthesis of extension products in the presence of the inducing agent. 
The exact lengths of the primers will depend on many factors, including temperature, source of primer 
and the use of the method. 

As used herein, the term "probe" refers to an oligonucleotide (i.e., a sequence of nucleotides), 

30 whether occurring naturally as in a purified restriction digest or produced synthetically, recombinantly 
or by PCR amplification, that is capable of hybridizing to another oligonucleotide of interest. A probe 
may be single-stranded or double-stranded. Probes are useful in the detection, identification, and 
isolation of particular gene sequences. It is contemplated that any probe used in the present disclosure 
will be labeled with any Reporter molecule," so that is detectable in any detection system, including, 

35 but not limited to enzyme {e.g., ELISA, as well as enzyme-based histochemical assays), fluorescent, 
radioactive, and luminescent systems. It is not intended that the present disclosure be limited to any 
particular detection system or label. 

As used herein, the term "target," when used in reference to the polymerase chain reaction, 
refers to the region of nucleic acid bounded by the primers used for polymerase chain reaction. Thus, 
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the 'target" is sought to be sorted out from other nucleic acid sequences. A "segment" is defined as a 
region of nucleic acid within the target sequence. 

As used herein, the term "polymerase chain reaction" ("PCR") refers to the methods of K.B. 
Mullis U.S. Patent Nos. 4,683,195, 4,683,202, and 4,965,188 that describe methods for increasing the 
5 concentration of a segment of a target sequence in a mixture of genomic DNA without cloning or 

purification. This process for amplifying the target sequence consists of introducing a large excess of 
two oligonucleotide primers to the DNA mixture containing the desired target sequence, followed by a 
precise sequence of thermal cycling in the presence of a DNA polymerase. The two primers are 
complementary to their respective strands of the double stranded target sequence. To effect 

10 amplification, the mixture is denatured and the primers then annealed to their complementary 
sequences within the target molecule. Following annealing, the primers are extended with a 
polymerase so as to form a new pair of complementary strands. The steps of denaturation, primer 
annealing, and polymerase extension can be repeated many times (i.e., denaturation, annealing and 
extension constitute one "cycle"; there can be numerous "cycles") to obtain a high concentration of an 

15 amplified segment of the desired target sequence. The length of the amplified segment of the desired 
target sequence is determined by the relative positions of the primers with respect to each other, and 
therefore, this length is a controllable parameter. By virtue of the repeating aspect of the process, the 
method is referred to as the "polymerase chain reaction" (hereinafter "PCR"). Because the desired 
amplified segments of the target sequence become the predominant sequences (in terms of 

20 concentration) in the mixture, they are said to be "PCR amplified." 

With PCR, it is possible to amplify a single copy of a specific target sequence in genomic 
DNA to a level detectable by several different methodologies (e.g., hybridization with a labeled probe; 
incorporation of biotinylated primers followed by avidin-enzyme conjugate detection; incorporation of 

32 

P-labeled deoxynucleotide triphosphates, such as dCTP or dATP, into the amplified segment). In 
25 addition to genomic DNA, any oligonucleotide or polynucleotide sequence can be amplified with the 
appropriate set of primer molecules. In particular, the amplified segments created by the PCR process 
itself are, themselves, efficient templates for subsequent PCR amplifications. 

As used herein, the terms "PCR product," "PCR fragment," and "amplification product 5 ' refer 
to the resultant mixture of compounds after two or more cycles of the PCR steps of denaturation, 
30 annealing and extension are complete. These terms encompass the case where there has been 
amplification of one or more segments of one or more target sequences. 

As used herein, the term "amplification reagents" refers to those reagents 
(deoxyribonucleotide triphosphates, buffer, etc), needed for amplification except for primers, nucleic 
acid template, and the amplification enzyme. Typically, amplification reagents along with other 
35 reaction components are placed and contained in a reaction vessel (test tube, microwell, etc.). 

As used herein, the terms "restriction endonucleases" and "restriction enzymes" refer to 
bacterial enzymes, each of which cut double-stranded DNA at or near a specific nucleotide sequence. 

As used herein, the term "recombinant DNA molecule" as used herein refers to a DNA 
molecule that is comprised of segments of DNA joined together by means of molecular biological 
40 techniques. 
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As used herein, the term "antisense" is used in reference to nucleic acid sequences that are 
complementary to a specific target nucleic acid sequence (e.g., mRNA). Included within this definition 
are antisense RNA ("asRNA") molecules involved in gene regulation by bacteria. Antisense RNA may 
be produced by any method, including synthesis by splicing the gene(s) of interest in a reverse 
5 orientation to a viral promoter that permits the synthesis of a coding strand. Once introduced into an 
embryo, this transcribed strand combines with natural mRNA produced by the embryo to form 
duplexes. These duplexes then block either the further transcription of the mRNA or its translation. In 
this manner, mutant phenotypes may be generated. The term "antisense strand" is used in reference to 
a nucleic acid strand that is complementary to the "sense" strand. The designation (-) (/.a, "negative") 

10 is sometimes used in reference to the antisense strand, with the designation (+) sometimes used in 

reference to the sense (i.e., "positive") strand. Regions of a nucleic acid sequences that are accessible 
to antisense molecules can be determined using available computer analysis methods. 

The term "isolated" when used in relation to a nucleic acid, as in "an isolated oligonucleotide" 
or "isolated polynucleotide" refers to a nucleic acid sequence that is identified and separated from at 

15 least one contaminant nucleic acid with which it is ordinarily associated in its natural source. Isolated 
nucleic acid is present in a form or setting that is different from that in which it is found in nature. In 
contrast, non-isolated nucleic acids are nucleic acids such as DNA and RNA found in the state they 
exist in nature. For example, a given DNA sequence (e.g., a gene) is found on the host cell 
chromosome in proximity to neighboring genes; RNA sequences, such as a specific mRNA sequence 

20 encoding a specific protein, are found in the cell as a mixture with numerous other mRNAs that encode 
a multitude of proteins. However, isolated nucleic acid encoding RFX4_v3 includes, by way of 
example, such nucleic acid in cells ordinarily expressing RFX4_v3 where the nucleic acid is in a 
chromosomal location different from that of natural cells, or is otherwise flanked by a different nucleic 
acid sequence than that found in nature. The isolated nucleic acid, oligonucleotide, or polynucleotide 

25 may be present in single-stranded or double-stranded form. When an isolated nucleic acid, 
oligonucleotide or polynucleotide is to be utilized to express a protein, the oligonucleotide or 
polynucleotide will contain at a minimum the sense or coding strand (i.e., the oligonucleotide or 
polynucleotide may be single-stranded), but may contain both the sense and anti-sense strands (Le. 9 the 
oligonucleotide or polynucleotide may be double-stranded). 

30 As used herein the term "portion" when in reference to a nucleotide sequence (as in "a portion 

of a given nucleotide sequence") refers to a fragment of that sequence. The fragments may range in 
size from four nucleotides to the entire nucleotide sequence minus one nucleotide (10 nucleotides, 20, 
30, 40, 50, 100, 200, etc.). 

As used herein the term "coding region" when used in reference to structural gene refers to the 

35 nucleotide sequences that encode the amino acids found in the nascent polypeptide as a result of 

translation of a mRNA molecule. The coding region is bounded, in eukaryotes, on the 5' side by the 
nucleotide triplet "ATG" that encodes the initiator methionine and on the 3' side by one of the three 
triplets which specify stop codons (i.e., TAA, TAG, ATC). 

As used herein, the term "purified" or ct to purify" or "purified," refers to molecules including, 

40 but not limited to nucleic or amino acid sequences, proteins, peptides, antibodies, or any organic 
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molecule, that are removed from their natural environment or from a sample. For example, RFX4 _v3 
antibodies are purified by removal of contaminating non-immunoglobulin proteins; they are also 
purified by the removal of immunoglobulin that does not bind RFX4_v3. The removal of non- 
immunoglobulin proteins and/or the removal of immunoglobulins that do not bind RFX4_v3 result in 

5 an increase in the percent of RFX4_v3-reactive immunoglobulins in the sample. In another example, 
recombinant RFX4_v3 polypeptides are expressed in bacterial host cells and the polypeptides are 
purified by the removal of host cell proteins; the percent of recombinant RFX4_v3 polypeptides is 
thereby increased in the sample. In another example, an "isolated nucleic acid sequence" is therefore a 
purified nucleic acid sequence. "Substantially purified" molecules are at least 60% free, preferably at 

10 least 75% free, and more preferably at least 90% free from other components with which they are 
naturally associated. 

The term "recombinant DNA molecule" as used herein refers to a DNA molecule that is 
comprised of segments of DNA joined together by means of molecular biological techniques. 

The term "recombinant protein" or "recombinant polypeptide" as used herein refers to a 
15 protein molecule that is expressed from a recombinant DNA molecule. 

The term "native protein" as used herein to indicate that a protein does not contain amino acid 
residues encoded by vector sequences; that is the native protein contains only those amino acids found 
in the protein as it occurs in nature. A native protein may be produced by recombinant means or may 
be isolated from.a naturally occurring source. 
20 As used herein the term "portion" when in reference to a protein (as in "a portion of a given 

protein") refers to fragments of that protein. The fragments may range in size from four consecutive 
amino acid residues to the entire amino acid sequence minus one amino acid. 

The term "antigenic determinant" as used herein refers to that portion of an antigen that makes 
contact with a particular antibody (/.e., an epitope). When a protein or fragment of a protein is used to 
25 immunize a host animal, numerous regions of the protein may induce the production of antibodies that 
bind specifically to a given region or three-dimensional structure on the protein; these regions or 
structures are referred to as antigenic determinants. An antigenic determinant may compete with the 
intact antigen {i.e., the "immunogen" used to elicit the immune response) for binding to an antibody. 

The term "transgene" as used herein refers to a foreign gene that is placed into an organism by 
30 introducing the foreign gene into newly fertilized eggs or early embryos. The term "foreign gene" 
refers to any nucleic acid (e.g. 7 gene sequence) that is introduced into the genome of an animal by 
experimental manipulations and may include gene sequences found in that animal so long as the 
introduced gene does not reside in the same location as does the naturally-occurring gene. 

As used herein, the term "vector" is used in reference to nucleic acid molecules that transfer 
35 DNA segments) from one cell to another. The term "vehicle" is sometimes used interchangeably with 
"vector." 

The term "expression vector" as used herein refers to a recombinant DNA molecule 
containing a desired coding sequence and appropriate nucleic acid sequences necessary for the 
expression of the operably linked coding sequence in a particular host organism. Nucleic acid 
40 sequences necessary for expression in prokaryotes usually include a promoter, an operator (optional), 
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and a ribosome binding site, often along with other sequences. Eukaryotic cells are known to utilize 
promoters, enhancers, and termination and polyadenylation signals. 

As used herein, the term "host cell" refers to any eukaryotic or prokaryotic cell (e.g. 9 bacterial 
cells such as E. coli, yeast cells, mammalian cells, avian cells, amphibian cells, plant cells, fish cells, 
5 and insect cells), whether located in vitro or in vivo. For example, host cells may be located in a 
transgenic animal. In some embodiments, a host cell is a plant cell, an animal cell, or a prokaryotic 
cell. 

The term "reduced expression" and grammatical equivalents, refers to a lesser expression of a 
nucleic acid product in a sample than is found in wild type controls. Expression may be reduced, for 

10 example, by 10%, 25%, 50%, or more. One method by which reduced expression may be determined 
is by using levels of mRNA to indicate a reduced level of expression as compared to that typically 
observed in a given tissue in a control or non-transgenic animal. For example, the comparison may be 
made between a wild type mouse and a transgenic mouse that is +/- or -/- for RFX4__v3 expression as a 
result of targeted gene disruption (see Detailed Description, section VI). Levels of mRNA are 

15 measured using any of a number of techniques known to those skilled in the art including, but not 

limited to Northern blot analysis. Appropriate controls are included on the Northern blot to control for 
differences in the amount of RNA loaded from each tissue analyzed (e.g., the amount of 28S rRNA, an 
abundant RNA transcript present at essentially the same amount in all tissues, present in each sample 
can be used as a means of normalizing or standardizing the RAD50 mRNA-specific signal observed on 

20 Northern blots). The amount of mRNA present in the band corresponding in size to the correctly 
spliced RFX4_v3 transgene RNA is quantified; other minor species of RNA which hybridize to the 
transgene probe are not considered in the quantification of the expression of the transgenic mRNA. 

The term "transfection" as used herein refers to the introduction of foreign DNA into 
eukaryotic cells. Transfection may be accomplished by a variety of means known to the art including 

25 calcium phosphate-DNA co-precipitation, DEAE-dextran-mediated transfection, polybrene-mediated 
transfection, electroporation, microinjection, liposome fusion, lipofection, protoplast fusion, retroviral 
infection, and biolistics. 

The term "stable transfection" or "stably transfected" refers to the introduction and integration 
of foreign DNA into the genome of the transfected cell. The term "stable transfectant" refers to a cell 

30 that has stably integrated foreign DNA into the genomic DNA. 

The term "transient transfection" or "transiently transfected" refers to the introduction of 
foreign DNA into a cell where the foreign DNA fails to integrate into the genome of the transfected 
cell. The foreign DNA persists in the nucleus of the transfected cell for several days. During this time 
the foreign DNA is subject to the regulatory controls that govern the expression of endogenous genes 

35 in the chromosomes. The term "transient transfectant" refers to cells that have taken up foreign DNA 
but have failed to integrate this DNA. 

The term "calcium phosphate co-precipitation" refers to a technique for the introduction of 
nucleic acids into a cell. The uptake of nucleic acids by cells is enhanced when the nucleic acid is 
presented as a calcium phosphate-nucleic acid co-precipitate. The original technique of Graham and 

40 van der Eb (Graham and van der Eb, Virol., 52:456 [1973]), has been modified by several groups to 
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optimize conditions for particular types of cells. The art is well aware of these numerous 
modifications. 

A "composition comprising a given polynucleotide sequence" as used herein refers broadly to 
any composition containing the given polynucleotide sequence. The composition may comprise an 
5 aqueous solution. Compositions comprising polynucleotide sequences encoding RFX4_v3 (e.g., SEQ 
ID NO: 5) or fragments thereof may be employed as hybridization probes. In this case, the RFX4_v3 
encoding polynucleotide sequences are typically employed in an aqueous solution containing salts 
(e.g, NaCl), detergents (e.g., SDS), and other components (e.g., Denhardt's solution, dry milk, salmon 
sperm DNA, eta) 

10 The term "test compound" refers to any chemical entity, pharmaceutical, drug, and the like 

that can be used to treat or inhibit the development of a disease, illness, sickness, or disorder of bodily 
function, or otherwise alter the physiological or cellular status of a sample. Test compounds comprise 
both known and potential therapeutic compounds. A test compound can be determined to be 
therapeutic by screening using the screening methods of the present disclosure. A "known therapeutic 

15 compound" refers to a therapeutic compound that has been shown (e.g., through animal trials or prior 
experience with administration to humans) to be effective in such treatment or prevention. 

The term "sample" as used herein is used in its broadest sense to include all biological 
samples, and by way of example includes amniotic fluid and tissue specimens (such as brain biopsy or 
tissue sections). A sample suspected of containing a human chromosome or sequences associated with 

20 a human chromosome may comprise a cell, chromosomes isolated from a cell (e.g., a spread of 

metaphase chromosomes), genomic DNA (in solution or bound to a solid support such as for Southern 
blot analysis), RNA (in solution or bound to a solid support such as for Northern blot analysis), cDNA 
(in solution or bound to a solid support) and the like. A sample suspected of containing a protein may 
comprise a cell, a portion of a tissue, an extract containing one or more proteins and the like. 

25 As used herein, the term "subject" refers to any animal (e.g., a mammal), including, but not 

limited to, humans, non-human primates, rodents, and the like, which is to be the recipient of a 
particular treatment. Typically, the terms "subject" and "patient 5 5 are used interchangeably herein in 
reference to a human subject. In addition, subject also refers to the unborn progency of the any animal, 
including, but not limited to humans, non-human primates, rodents, and the like. 

30 

GENERAL DESCRIPTION OF THE DISCLOSURE 

The present disclosure relates to a novel splice variant of the Regulatory Factor X 4 (RFX4) 
member of the winged helix transcription factor family that is preferentially expressed in the 
developing brain. Members of the RFX family of winged-helix transcription factors are involved in the 

35 regulation of many cellular processes. This novel splice variant is designated RFX4 variant transcript 
3, (RFX4_v3.) When one allele is defective, there is universal congenital hydrocephalus with 
aqueductal stenosis, probably secondary to agenesis of the subcommissural organ. This defect appears 
to be compatible with life, and in some cases normal fertility. This hydrocephalus exhibits an 
autosomal dominant inheritance pattern. When two alleles are defective, there is severe disruption of 

40 brain formation and prenatal or perinatal death. 
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While an understanding of the mechanism is not required to practice the present disclosure 
and the present disclosure is not limited to any particular mechanism, it is contemplated that the RFX4 
transcript is responsible for dose-dependent brain phenotypes: hydrocephalus associated with 
hypoplasia or aplasia of the subcommissural organ in the heterozygote, and severe and lethal defects of 
5 telencephalon formation in the homozygote. The subcommissural organ appears to be highly 

susceptible to quantitative decreases in the expression of this transcript and thereby be a key regulator 
of early telencephalon development Continued high levels of expression in the adult brain also 
suggest a key role after development. 

In humans, this RFX4 transcript is composed of 18 exons from an approximately 200 kb 
10 region on human chromosome 12. Some of the exons are common to other RFX4 isoforms that are 
generally enriched in testis. However, the RFX4_v3 transcript is novel in that it contains a mixture of 
exons from two previously identified transcripts as well as a completely novel exon that encodes the 
amino terminus in the protein. 

This transcript finds use as the basis for diagnostic tests for this type of familial congenital 
15 hydrocephalus, applied to prenatal samples such as amniotic fluid, or to parental DNA specimens for 
use in genetic counseling. Knowledge of a familial predisposition to congenital hydrocephalus aids 
family planning and genetic counseling decisions, and also permits prenatal diagnosis and early shunt 
placement to prevent death or neurological morbidity. 

Diagnostic tests also find use for screening potentially heterozygous affected children, both 
20 prenatal and postnatal, and their heterozygous parents. In some embodiments, the diagnostic tests 

utilize cDNAs spanning either the complete transcript, partial transcript, splice site mutations, promoter 
abnormalities, or mutations in the key DNA binding domain. 

DETAILED DESCRIPTION OF THE DISCLOSURE 

25 The present disclosure relates to RFX4jv3 protein and nucleic acids encoding the RFX4_v3 

protein. The present disclosure encompasses both native and recombinant wild-type forms of 
RFX4_v3, as well as mutant and variant forms, some of which possess altered characteristics relative to 
the wild-type RFX4_v3. The present disclosure also relates to methods of using RFX4_v3, including 
altered expression in transgenic organisms and expression in prokaryotes and cell culture systems. The 

30 present disclosure also encompasses methods for screening for drugs that inhibit or potentiate 

RFX4_v3 action. The present disclosure also relates to methods for screening for susceptibility to 
congenital hydrocephalus. 

An embodiment of the present disclosure demonstrates that the disrupted expression of the 
novel isoform of the RFX4 transcript (RFX4__v3) is responsible for a dosage-dependent brain 

35 phenotype. Congenital hydrocephalus is associated with hypoplasia or absence of the subcommissural 
organ (SCO) in heterozygous mice, whereas severe and lethal defects of midline brain structure 
formation are found in homozygous mice missing both alleles of the RFX4_v3 gene. The present 
disclosure demonstrates that a quantitative decrease in the expression of the RFX4_v3 transcript is 
sufficient to interfere specifically with the development of the SCO, leading to effective stenosis of the 

40 aqueduct of Sylvius and congenital hydrocephalus. This partial RFX4_v3 deficiency is nonetheless 
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compatible with post-natal life, and in some cases with successful fertility. In contrast, in alternate 
embodiments of the present disclosure, complete deficiency of this transcript leads to catastrophic 
failure of midline structure formation in early brain development and universal prenatal or perinatal 
death. The present disclosure identifies RFX4_v3 as a key, early regulator of midline brain structure 
5 development in the vertebrate animal. An embodiment of the disclosure demonstrates that the 
continued high-level expression of RFX4_v3 in the adult mouse brain also indicates a role after 
development. 

The RFX family of winged-helix transcription factors is comprised of seven primary 
transcripts, each of which is thought to bind to the "X box" of gene promoters and thus regulate gene 

10 expression (Morotomi-Yano, et al, J. Biol. Chem., 277:836-842 [2002]), herein incorporated by 

reference). The RFX proteins belong to the winged-helix subfamily of helix-turn-helix transcription 
factors, and are so named because they bind to "X-boxes." The RFX4 member of this family has been 
described as a testis-specific transcript whose downstream DNA targets were not known (Morotomi- 
Yano, et al, J. Biol Chem., 277:836-842 [2002]). In addition, an estrogen receptor related protein 

15 contains a portion of the putative RFX4 transcript, and other variants including portions of the RFX4 
sequence are present in GenBank. 

The X-box consensus sequence is 5'-GTNRCC(0-3N)RGYAAC-3\ where N is any 
nucleotide, R is a purine and Y is a pyrimidine. Five RFX proteins have been described in man 
(RFX1-RFX5), all of which contain a highly conserved DNA binding domain near the amino terminus. 

20 A structure has been determined for the binding of this domain from RFX1 to an X-box sequence 

(Gajiwala et al, Nature, 403:916-21 [2000]); this shows that the "wing" of this DNA binding domain is 
used to recognize DNA. Members of this family have been implicated in the transcriptional regulation 
of a number of important genes. 

A partial sequence of a novel family member, termed RFX4, was initially identified by 

25 Dotzlaw et al (Dotzlaw et al, Mol Endocrinol* 6:773-7785 [1992]) as part of a fusion cDNA in 

human breast cancers, in which the amino-terminal estrogen binding domain of the estrogen receptor 
was fused with the RFX DNA binding domain. More recently, two full-length RFX4 cDNAs have 
been described and categorized. The new RFX4 v3 variant described here is composed of novel exons 
as well as exons derived from one or both of these two earlier variants. As illustrated in Fig. 2, the 

30 RFX4_v3 cDNA is the largest of the three and is composed of a unique 5'exon of approximately 476 
bp that encodes the first 14 amino acids of RFX4_v3; this is then followed by four exons shared only 
with RFX4_v2, then 10 exons shared with both RFX4_vl and RFX4_v2, and finally three 3'-exons 
shared only with RFX4_vl . 

An embodiment of the present disclosure shows that the novel RFX4_v3 transcript is highly 

35 expressed during early to mid-gestation in the mouse, during the critical periods of telencephalon 
formation. The novel RFX4_v3 transcript is also highly expressed in adult brain. In still further 
embodiments of the present disclosure, a 3' -probe used for northern analysis detected abundant 
expression of the RFX4_vl transcript in testis, and still smaller transcripts in liver. 

Abnormalities of the SCO have been associated with hydrocephalus in many studies 

40 (reviewed by Perez-Figares et al, Microsc. Res. Tech., 52:591-607 [2001]). It is contemplated that the 
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SCO abnormalities preceding and causing the hydrocephalus are due to effective stenosis of the 
aqueduct of Sylvius. Therefore, it appears that the aplasia or hypoplasia of the SCO seen in the 
RFX4_v3 hemi2ygous mice is the cause of the congenital hydrocephalus, presumably by interfering 
with cerebrospinal fluid flow through the rostral part of the aqueduct 

5 

I. RFX4_v3 Polynucleotides 

The present disclosure arose from the discovery that an epoxygenase transgene had interrupted 
the RFX4_v3 gene. Genomic sequences flanking the transgene were identified using PCR based on 5' 
and 3' transgene sequences. At least two tandem copies of the 7.5 kb transgene in genomic DNA from 

10 the transgenic mice, indicated that the potential genomic interruption was at least 15 kb in size; 

Southern blot analysis using a transgene specific probe indicated that there was only one copy of this 
concatenated transgene in the mouse genome. Using the GENOMEWALKER technique with genomic 
DNA from the transgenic mice and transgene specific oligonucleotide primers, both the 5' and 3' 
flanking genomic sequences into which the transgene had been inserted were identified. When these 

1 5 sequences were compared to the mouse genomic sequences in the GenBank trace archives, the 

transgene insertion site was identified as between bp 528 and 529 in gnl|ti|13973384 and between bp 
171 and 172 in gnl|ti|84074979. The 5' and 3' flanking sequences identified by the 
GENOMEWALKER technique were contiguous in the normal mouse genomic sequences in the trace 
archives, indicating that the transgene insertion was not accompanied by a genomic deletion, as has 

20 been seen in some recent examples of accidental transgenic insertional mutagenesis (Durkin, et al., 

Genomics, 73:20-7, [2001]; Overbeek, et al., Genesis, 30:26-35, [2001]). Southern analysis using a 3'- 
insertion site-specific probe demonstrated the presence of single novel bands in restriction enzyme- 
digested DNA from the transgenic mice, confirming a single transgene insertion site at this location. 

The flanking sequences identified by the GENOMEWALKER approach were merged with 

s 

25 available mouse genomic sequences from trace archives to form a small contig; no cDNAs or 

expressed sequence tags (ESTs) matched. Therefore, the assembled mouse contig was used to search 
the human genome sequences then available in GenBank, using BLAST. The mouse sequence was 
highly related (4e-28) to a human genomic sequence corresponding to a portion of human chromosome 
12 (GenBank Accession No.: NTJ309720.8). See Fig. 1 (entitled, "Alignment of mouse sequences 

30 with the human chromosome 12 genomic clone NT_009720"). When this small region of human 

genomic sequence was analyzed for expressed sequences, it did not match any expressed in GenBank. 
However, when a much larger amount of human genomic DNA from this locus was used to search for 
expressed sequences, genomic DNA within 200 kb of the human sequence corresponding to the 
transgene insertion site was found to contain all of the exons of two distinct cDNAs in GenBank that 

35 correspond to two forms of the human winged helix protein RFX4. One embodiment, RFX4 variant 
transcript 2 (RFX4_v2) is represented by GenBank Accession7.1 No. NM_002920 (SEQ ID NO:l), 
corresponding to protein accession number NPJ)0291 1 (SEQ ID NO:2). The other embodiment, 
RFX4 variant transcript 1 (RFX4_vl) is represented by GenBank Accession No. NMJ)32491 (SEQ ID 
NO: 14) corresponding to protein accession number NP_11 5880 (SEQID NO: 15). RFX4_vl is 

40 derived from GenBank Accession No. AF332192 (SEQ ID NO: 3), corresponding to protein accession 
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number AAK17191 (SEQ ID NO: 4). See Fig. 2 (entitled, "The human RFX4_v3 locus"). According 
to these alignments, the site of the transgene insertion within the mouse genome was at a corresponding 
region within the human genome that would be within the intron between exons 13 and 14 of 
AF332192 (SEQ ID NO: 3) (or RFX4_vl) and would not have affected the exon arrangements of 
5 NM 002920 (SEQ ID NO: 1) (or RFX4_v2). 

Using PCR primers based on the inserted transgene and the neighboring endogenous mouse 
genomic DNA, the wild-type (+/+) and transgene-interrupted alleles (+/-) for one disrupted allele 
(heterozygous) ((-/-) for both alleles disrupted (homozygous)) were found to be readily distinguished in 
a litter of newborn mice from interbred transgenic mice. 

10 To examine whether the transgene insertion interfered with the expression of a full-length 

mouse RFX4 transcript in brain, Northern blots from brains of neonatal (+/+), (+/-), and (-/-) were 
performed with a mouse brain EST cDNA clone (IMAGE # 763537, GenBank Accession Nos. 
AA285775 and AI462920) that was highly related (e-124 over 284 aligned bases) to the 3'- end of the 
human cDNA RFX4_vl (SEQ ID NO: 14). Brains from the +/+ mice expressed a prominent band of 

15 ~4kb that is referred to as RFX4 variant transcript 3, (RFX4_v3). This revealed that the brains from 

the (+/-) heterozygous mice expressed approximately 50% of the normal transcript, whereas the brains 
from the (-/-) homozygous mice expressed no detectable transcript of this size. Probing the same blot 
with an actin cDNA demonstrated that gel loading was similar in the three lanes. Similar results were 
obtained in three separate experiments. There was no evidence for the expression of a truncated 

20 mRNA in the brain samples from either the +/- or -/- mice. These studies confirmed that an mRNA 
species of 4 kb that was recognized by a probe derived from putative mouse 3* RFX4_vl sequences 
was decreased in amount in brains of the (+/-) heterozygous and absent in the brains (-/-) homozygous 
mice, indicating that the insertion of the transgene interfered with the expression of the putative brain 
RFX4_v3 transcript. 

25 Using the same probe to examine the tissue-specific and developmental expression of this 

RFX4 transcript, high-level expression of a slightly smaller transcript in normal adult testis was found, 
and lower level expression of a considerably smaller transcript in liver. The largest species, which 
corresponds to the brain-specific transcript, was primarily found in whole embryos early in 
development. RFX4_v3 in the adult is highly expressed in the whole embryo in early development, 

30 initially appearing between embryonic day (E) 7.5 and 9.5. 

Using primers based on brain-specific mouse EST sequences that contained internal sequences 
highly related to the human RFX4 cDNAs in GenBank, PCR and an adult mouse brain cDNA library 
were used to generate a 3 kb plasmid insert that was then sequenced. This cDNA has been designated 
the RFX4 variant transcript 3 (RFX4_v3). When this sequence was merged with all available 5* and 3' 

35 mouse ESTs from GenBank, the resulting mouse RFX4_v3 transcript (SEQ ID NO: 5) (GenBank 

Accession No. AY1020010) closely approximated the transcript size seen on Northern blots. Similar 
probes were then used to screen a human brain cDNA library, and positive inserts were sequenced. 
This novel DNA sequence has been designated human RFX4_v3 (SEQ ID NO: 7) (GenBank 
Accession No. AY 102009). The predicted unique mouse amino terminal protein sequence also was 

40 used to search the non-human non-mouse ESTs in GenBank, and a zebrafish EST clone (AI657628) 
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with a nearly identical predicted protein sequence was obtained from the IMAGE consortium and 
sequenced. This cDNA sequence is referred to as zebrafish RFX4_v3 (SEQ ID NO: 9) (GenBank 
Accession No.: AY 1020 11). 

The human chromosome 12 sequence was then searched with the mouse and human cDNA 

5 sequences, and it was determined that the exons contributed to the novel human RFX4_v3 isoform 
(SEQ ID NO: 7), in addition to those embodiments described above that corresponded to the two 
previously described human cDNAs (SEQ ID NOS: 1, 3, and 14). The two previously described 
human RFX4 cDNAs (RFX4_vl and RFX4_v2) are composed of both unique and shared exons. In the 
case of the cDNA represented by accession number NM_002920 (SEQ ID NO: 1), the first five exons 

10 (shown as exons 1-5 of RFX4_v2 (NM_002920) in Fig. 2) correspond to five exon coding sequences 
within the 90 kb interval between bp 390,000 - 480,000 of the genomic clone NT_009720.8 (in reverse 
complement orientation). The next nine exons and part of a tenth (exons 6-15A of RFX4_v2 in Fig. 2) 
are common to the other version of RFX4 in GenBank (RFX4_vl), represented by the cDNA 
NM_032491 (SEQ ID NO: 14). These 10 exons are derived from exon coding sequences in the 

15 genomic clone NT_009720.8 between 340,000 and 400,000. As shown in Fig. 2, the final (15th) exon 
of RFX4_v2 contains a polyadenylation (poly A) sequence that allows for final processing of the 
mature mRNA. 

The other human cDNA, RFX4_vl, contains an amino terminal exon l(hatching) that is 
encoded by an exon located between exons 5 and 6 of RFX_v2 (see Fig. 2). RFX4_vl then shares 

20 exons (2-1 1) with RFXjv2 (exons 6-1 5 A), followed by three unique carboxyl terminal exons (exons 
12-14 of RFX4_vl). These last three unique exons are found within the interval bp 3 15,000 - 325,000 
of the genomic clone NT_009720.8. Exon 12 from RFX4_vl is apparently spliced into exon 15 of 
RFX4jv2, resulting in the novel 3' end of RFX4_vl and a different poly A tail. The displaced 
sequence in RFX4_v2 is represented as exon 15B in Fig. 2. 

25 The exon pattern that corresponds to the mouse and human RFX4__v3 mRNAs and proteins is 

illustrated at the bottom of Fig. 2. A completely novel exon 1, derived from a sequence between 
480,000 and 500,000 of NT_009720.8, was used to form the first 14 amino acids at the amino terminal 
end (Fig. 2). The next four exons, 2-5, are composed of the four exons of the same number from 
RFX4_v2; exon 1 of RFX4_v2 is not present in the RFX4_v3 cDNA. The middle of the RFX4jv3 

30 cDNA is formed by the 10 exons (exons 6-15 of RFX4 v3) held in common between RFX4_v2 (exons 
6-15A) and RFX4 vl (exons 2-1 1). The carboxyl terminus of RFX4 v3 (exons 16-18) is composed of 
the three carboxyl-terminal exons present only in RFX4_vl (exons 12-14)). Thus, the novel RFX4_v3 
isoform (SEQ ED NO: 7) described here comprises of a unique arrangement of 18 exons derived from 
almost 200 kb of human genomic sequence. One exon (the first) appears to be unique to this sequence; 

35 exons 2-5 are shared with RFX4_v2; exons 6-15 are shared with both RFX4_vl and RFX4_v2; and 
exons 16-18 are shared with only RFX4_vl. 

The site of transgene interruption of RFX4__v3 is also illustrated in Fig. 2 with a large black X. 
The greater than 15 kb transgene was inserted into the intron between exons 17 and 18 of RFX4_v3 
(SEQ ID NO: 7), within the carboxyl-terminal end of the protein coding region, and appears to interfere 
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with splicing of the final exon and generation of an intact mature mRNA. However, an understanding 
of the mechanism is not necessary in order to make and use the present disclosure. 

The present disclosure also provides nucleic acids encoding RFX4_v3 genes, homologs, 
variants, and mutants (e.g, SEQ ID NOS: 1, 3, 5, 7, and 9). In some embodiments, polynucleotide 
5 sequences are capable of hybridizing to SEQ ID NOS: 1, 3, 5, 7, and 9 under conditions of low to high 
stringency as long as the polynucleotide sequence capable of hybridizing encodes a protein that retains 
a biological activity of the naturally occurring RFX4_v3. In some embodiments, the protein that 
retains a biological activity of naturally occurring RFX4_v3 is 70% homologous to wild-type 
RFX4_y3, preferably 80% homologous to wild-type RFX4_v3, more preferably 90% homologous to 
1 0 wild-type RFX4_v3, and most preferably 95% homologous to wild-type RFX4_v3 . In preferred 

embodiments, hybridization conditions are based on the melting temperature (T^) of the nucleic acid 

binding complex and confer a defined "stringency" as explained above (see e.g., Wahl, et aL, Meth 
Enzymol., 152:399-407 [1987]). 

In other embodiments of the present disclosure, alleles of RFX4_v3 are provided. In preferred 

15 embodiments, alleles result from a mutation, (i.e., a change in the nucleic acid sequence) and generally 
produce altered mRNAs or polypeptides whose structure or function may or may not be altered. Any 
given gene may have none, one or many allelic forms. Common mutational changes that give rise to 
alleles are generally ascribed to deletions, additions or substitutions of nucleic acids. Each of these 
types of changes may occur alone, or in combination with the others, and at the rate of one or more 

20 times in a given sequence. Examples of the alleles of the present disclosure include those encoded by 
SEQ ID NOS: 5, 7 and 9 (wild-type) and those same sequences with an epoxygenase transgene 
insertion resulting in congenital hydrocephalus alleles. 

In still other embodiments of the present disclosure, the nucleotide sequences of the present 
disclosure may be engineered in order to alter a RFX4_v3 coding sequence for a variety of reasons, 

25 including but not limited to, alterations which modify the cloning, processing and/or expression of the 
gene product. For example, mutations may be introduced using techniques that are well known in the 
art (e.g, site-directed mutagenesis to insert new restriction sites, to alter glycosylation patterns, to 
change codon preference, etc.) In some embodiments, mutations are created in the sequence to 
generate a dysfunctional gene product (e.g> a stop codon is placed at any position within the coding 

30 sequence). Such compositions find use as positive controls and for generating null cell lines and 

animal models through homologous recombination substituting for the wild-type counterpart. Such 
compositions also find use as a control for dose-dependent expression of congenital hydrocephalus. 

In some embodiments, the polynucleotide sequence of RFX4_v3 may be extended utilizing 
the nucleotide sequences (e.g 9 SEQ ID NOS: 5, 7, and 9) in various methods known in the art to detect 

35 upstream sequences such as promoters and regulatory elements. Using this method, the sequence for 
the proximal promoter for human RFX4_v3 (SEQ ID NO: 1 1) and mouse RFX4_v3 (SEQ ID NO: 12) 
were identified. Figure 3 demonstrates a partial alignment of human and mouse proximal promoter 
sequences for RFX4_v3. 

In other embodiments, it is contemplated that restriction-site polymerase chain reaction (PCR) 

40 finds use in the present disclosure for identifying unknown sequences adjacent to RFX4_v3. This is a 
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direct method that uses universal primers to retrieve unknown sequence adjacent to a known locus 
(Gobinda et al 9 PCR Methods Applic, 2:3 18-22 [1993]). First, genomic DNA is amplified in the 
presence of primer to a linker sequence and a primer specific to the known region. The amplified 
sequences are then subjected to a second round of PCR with the same linker primer and another 
5 specific primer internal to the first one. Products of each round of PCR are transcribed with an 
appropriate RNA polymerase and sequenced using reverse transcriptase. 

In another embodiment, inverse PCR is used to amplify or extend sequences using divergent 
primers based on a known region (Triglia et ah, Nucleic Acids Res., 16:8186 [1988]). The primers may 
be designed using Oligo 4.0 (National Biosciences Inc, Plymouth, MN), or another appropriate 

10 program, to be 22-30 nucleotides in length, to have a GC content of 50% or more, and to anneal to the 
target sequence at temperatures about 68-72°C. The method uses several restriction enzymes to 
generate a suitable fragment in the known region of a gene. The fragment is then circularized by 
intramolecular ligation and used as a PCR template. In still other embodiments, walking PCR is 
utilized. Walking PCR is a method for targeted gene walking that permits retrieval of unknown 

15 sequence (Parker et al, Nucleic Acids Res. , 19:3055-60 [1991]). The PROMOTERPINDER kit 

(Clontech) uses PCR, nested primers and special libraries to "walk in" genomic DNA. This process 
avoids the need to screen libraries and is useful in finding intron/exon junctions. Preferred libraries for 
screening for full-length cDNAs include mammalian libraries (e.g., mouse and human libraries that 
were used to originally identify isoforms of RFX4) that have been size-selected to include larger 

20 cDNAs. Also, random primed libraries are preferred, in that they will contain more sequences that 
contain the 5' and upstream gene regions. A randomly primed library may be particularly useful in 
case where an oligo d(T) library does not yield full-length cDNA. Genomic mammalian libraries are 
useful for obtaining introns and extending 5' sequence. 

In other embodiments of the present disclosure, variants of the disclosed RFX4_v3 sequences 

25 are provided. In preferred embodiments, variants result from mutation, (i.e., a change in the nucleic 

acid sequence) and generally produce altered mRNAs or polypeptides whose structure or function may 
or may not be altered. Wherein mRNAs or polypeptides structures or functions are altered, a dose- 
dependent phenotype of congenital hydrocephalus appears. Any given gene may have none, one, or 
many variant forms. Common mutational changes that give rise to variants are generally ascribed to 

30 deletions, additions or substitutions of nucleic acids. Each of these types of changes may occur alone, 
or in combination with the others, and at the rate of one or more times in a given sequence. Diagnostic 
methods can detect mutational changes to diagnose or predict the development of RFX4 v3 linked 
congenital hydrocephalus. 

A modified peptide can be produced in which the nucleotide sequence encoding the 

35 polypeptide has been altered, such as by substitution, deletion, or addition. In particularly preferred 
embodiments, these modifications do not significantly reduce the biological activity of the modified 
RFX4_v3. In other words, a modified construct can be evaluated in order to determine whether it is a 
member of the genus of modified or variant RFX4_v3 's of the present disclosure as defined 
functionally, rather than structurally. In preferred embodiments, the activity of variant or mutant 

40 RFX4_v3 is evaluated by the presence of the congenital hydrocephalus phenotype, for example in mice 
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that express the variant. Accordingly, in some embodiments, the present disclosure provides nucleic 
acids encoding a RFX4_v3 that differentially provides varying degrees of congenital hydrocephalus. 

Moreover, as described above, variant forms of RFX4_v3 are also contemplated as being 
equivalent to those peptides and DNA molecules that are set forth in more detail herein. For example, 
it is contemplated that isolated replacement of a leucine with an isoleucine or valine, an aspartate with a 
glutamate, a threonine with a serine, or a similar replacement of an amino acid with a structurally 
related amino acid (i.e., conservative mutations) will not have a major effect on the biological activity 
of the resulting molecule. Accordingly, some embodiments of the present disclosure provide variants 
of RFX4_v3 disclosed herein containing conservative replacements. Conservative replacements are 
those that take place within a family of amino acids that are related in their side chains. Genetically 
encoded amino acids can be divided into four families: (1) acidic (aspartate, glutamate); (2) basic 
(lysine, arginine, histidine); (3) nonpolar (alanine, valine, leucine, isoleucine, proline, phenylalanine, 
methionine, tryptophan); and (4) uncharged polar (glycine, asparagine, glutamine, cysteine, serine, 
threonine, tyrosine). Phenylalanine, tryptophan, and tyrosine are sometimes classified jointly as 
aromatic amino acids. In similar fashion, the amino acid repertoire can be grouped as (1) acidic 
(aspartate, glutamate); (2) basic (lysine, arginine, histidine), (3) aliphatic (glycine, alanine, valine^ 
leucine, isoleucine, serine, threonine), with serine and threonine optionally be grouped separately as 
aliphatic-hydroxyl; (4) aromatic (phenylalanine, tyrosine, tryptophan); (5) amide (asparagine, 
glutamine); and (6) sulfur -containing (cysteine and methionine) (e.g., Stryer ed., Biochemistry, pg. 17- 
21, 2nd ed, WH Freeman and Co., 1981). Whether a change in the amino acid sequence of a peptide 
results in a functional homolog can be readily determined by assessing the ability of the variant peptide 
to function in a fashion similar to the wild-type protein. Peptides having more than one replacement 
can readily be tested in the same manner. 

More rarely, a variant includes "nonconservative" changes (e.g., replacement of a glycine with 
a tryptophan). Analogous minor variations can also include amino acid deletions or insertions, or both. 
Guidance in determining which amino acid residues can be substituted, inserted, or deleted without 
abolishing biological activity can be found using computer programs (eg., LASERGENE software, 
DNASTAR Inc., Madison, WI). 

As described in more detail below, variants may be produced by methods such as directed 
evolution or other techniques for producing combinatorial libraries of variants, described in more detail 
below. In still other embodiments of the present disclosure, the nucleotide sequences of the present 
disclosure may be engineered in order to alter a RFX4_v3 coding sequence including, but not limited 
to, alterations that modify the cloning, processing, localization, secretion, and/or expression of the gene 
product. For example, mutations may be introduced using techniques that are well known in the art 
(e.g., site-directed mutagenesis to insert new restriction sites, alter glycosylation patterns, or change 
codon preference, etc.) 

II. RFX4_v3 Polypeptides 

In other embodiments, the present disclosure provides RFX4_v3 polynucleotide sequences 
that encode RFX4_v3 polypeptide sequences. An alignment of the amino terminal end of three 
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predicted amino acid sequences is shown in Fig. 4 and in Fig. 6 for human, mouse and zebrafish 
RFX4_v3 (SEQ ID NOS: 8, 6, and 10, respectively); these translation protein sequences correspond to 
nucleic acid sequences of SEQ ID NOS: 37, 38 and 39, respectively. There is significant amino acid 
identity between the mouse and human proteins as demonstrated by Fig. 5 and Fig. 6. There is 96% 
5 amino acid identity between the predicted mouse and human proteins, and 83% amino acid identity 
between the human and zebrafish proteins. The alignment also illustrates several of the characteristic 
• domains of the RFX4_v3 proteins that are conserved in all three orthologues, i.e., the highly conserved 
DNA binding domain, boxes B and C, and the dimerization domain (Morotomi-Yano, et al., J. Biol. 
Chem., 277:836-842 [2002]). See Fig. 6. 

10 Other embodiments of the present disclosure provide fragments, fusion proteins or functional 

equivalents of these RFX4_v3 proteins. In still other embodiments of the present disclosure, nucleic 
acid sequences corresponding to these various RFX4_v3 homologs and mutants may be used to 
generate recombinant DNA molecules that direct the expression of the RFX4_y3 homologs and 
mutants in appropriate host cells. In some embodiments of the present disclosure, the polypeptide may 

15 be a naturally purified product, in other embodiments it may be a product of chemical synthetic 
procedures, and in still other embodiments it may be produced by recombinant techniques using a 
prokaryotic or eukaryotic host (e.g., by bacterial, yeast, higher plant, insect and mammalian cells in 
culture). In some embodiments, depending upon the host employed in a recombinant production 
procedure, the polypeptide of the present disclosure may be glycosylated or may be non-glycosylated. 

20 In other embodiments, the polypeptides of the disclosure may also include an initial methionine amino 
acid residue. 

In one embodiment of the present disclosure, due to the inherent degeneracy of the genetic 
code, DNA sequences other than the polynucleotide sequences described above, which encode 
substantially the same or a functionally equivalent amino acid sequence, may be used to clone and 

25 express RFX4_v3. In general, such polynucleotide sequences hybridize to the sequences described 
above under conditions of high to medium stringency as described above. As will be understood by 
those of skill in the art, it may be advantageous to produce RFX4_v3-encoding nucleotide sequences 
possessing non-naturally occurring codons. Therefore, in some preferred embodiments, codons 
preferred by a particular prokaryotic or eukaryotic host (Murray et al, NucL Acids Res., 17 [1989]) are 

30 selected, for example, to increase the rate of RFX4_v3 expression or to produce recombinant RNA 
transcripts having desirable properties, such as a longer half-life, than transcripts produced from 
naturally occurring sequence. 

1. Vectors for Production of RFX4_v3 

35 The polynucleotides of the present disclosure may be employed for producing polypeptides by 

recombinant techniques. Thus, for example, the polynucleotide may be included in any one of a 
variety of expression vectors for expressing a polypeptide. In some embodiments of the present 
disclosure, vectors include, but are not limited to, chromosomal, nonchromosomal, and synthetic DNA 
sequences (e.g., derivatives of SV40, bacterial plasmids, phage DNA, baculovirus, yeast plasmids, 

40 vectors derived from combinations of plasmids and phage DNA, and viral DNA such as vaccinia, 
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adenovirus, fowl pox virus, and pseudorabies). It is contemplated that any vector may be used as long 
as it is replicable and viable in the host. 

In particular, some embodiments of the present disclosure provide recombinant constructs 
comprising one or more of the sequences as broadly described above (e.g., SEQ ID NOS: 5, 7, and 9). 
5 In some embodiments of the present disclosure, the constructs comprise a vector, such as a plasmid or 
viral vector, into which a sequence of the disclosure has been inserted, in a forward or reverse 
orientation. In still other embodiments, the heterologous structural sequence is assembled in 
appropriate phase with translation initiation and termination sequences. In preferred embodiments of 
the present disclosure, the appropriate DNA sequence is inserted into the vector using any of a variety 

10 of procedures. In general, the DNA sequence is inserted into an appropriate restriction endonuclease 
site(s) by procedures known in the art. 

Large numbers of suitable vectors are known to those of skill in the art, and are commercially 
available. Such vectors include, but are not limited to, the following vectors: 1) Bacterial— pQE70, 
pQE60, pQE-9 (Qiagen), pBS, pDIO, phagescript, psiX174, pbluescript SK, pBSKS, pNH8A, pNH16a, 

15 pNH18A, pNH46A (Stratagene); ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); and 2) 
Eukaryotic-pAVLNEO, pSV2CAT, pOG44, PXT1, pSG (Stratagene) pSVK3, pBPV, pMSG, pSVL 
(Pharmacia). Any other plasmid or vector may be used as long as they are replicable and viable in the 
host. In some preferred embodiments of the present disclosure, mammalian expression vectors 
comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome 

20 binding sites, polyadenylation sites, splice donor and acceptor sites, transcriptional termination 

sequences, and 5' flanking non-transcribed sequences. In other embodiments, DNA sequences derived 
from the SV40 splice, and polyadenylation sites may be used to provide the required non-transcribed 
genetic elements. 

In certain embodiments of the present disclosure, the DNA sequence in the expression vector 
25 is operatively linked to an appropriate expression control sequence(s) (promoter) to direct mRNA 

synthesis. Promoters useful in the present disclosure include, but are not limited to, the LTR or SV40 
promoter, the E. coli lac or trp 9 the phage lambda P L and P R , T3 and T7 promoters, and the 

cytomegalovirus (CMV) immediate early, herpes simplex virus (HSV) thymidine kinase, and mouse 
metallothionein-I promoters and other promoters known to control expression of gene in prokaryotic or 

30 eukaryotic cells or their viruses. In other embodiments of the present disclosure, recombinant 

expression vectors include origins of replication and selectable markers permitting transformation of 
the host cell (e,g., dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or 
tetracycline or ampicillin resistance in E. coli). 

In some embodiments of the present disclosure, transcription of the DNA encoding the 

35 polypeptides of the present disclosure by higher eukaryotes is increased by inserting an enhancer 

sequence into the vector. Enhancers are c/s-acting elements of DNA, usually about from 10 to 300 bp 
that act on a promoter to increase its transcription. Enhancers useful in the present disclosure include, 
but are not limited to, the SV40 enhancer on the late side of the replication origin bp 100 to 270, a 
cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side of the replication 

40 origin, and adenovirus enhancers. 
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In other embodiments, the expression vector also contains a ribosome binding site for 
translation initiation and a transcription terminator. In still other embodiments of the present 
disclosure, the vector may also include appropriate sequences for amplifying expression. 



5 2. Host Cells for Production of RFX4_v3 

In a further embodiment, the present disclosure provides host cells containing the above- 
described constructs. In some embodiments of the present disclosure, the host cell is a higher 
eukaryotic cell (eg., a mammalian or insect cell). In other embodiments of the present disclosure, the 
host cell is a lower eukaryotic cell (e.g., a yeast cell). In still other embodiments of the present 

10 disclosure, the host cell can be a prokaryotic cell (e.g., a bacterial cell). Specific examples of host cells 
include, but are not limited to, Escherichia coli, Salmonella typhimurium, Bacillus subtil is, and various 
species within the genera Pseudomonas, Streptomyces, and Staphylococcus, as well as Saccharomycees 
cerivisiae, Schizosaccharomycees pombe, Drosophila S2 cells, Spodoptera Sf9 cells, Chinese hamster 
ovary (CHO) cells, COS-7 lines of monkey kidney fibroblasts, (Gluzman, Cell, 23:175 [1981]), C127, 

15 3T3, 293, 293T, HeLa and BHK cell lines. 

The constructs in host cells can be used in a conventional manner to produce the gene product 
encoded by the recombinant sequence. In some embodiments, introduction of the construct into the 
host cell can be accomplished by calcium phosphate transfection, DEAE-Dextran mediated 
transfection, or electroporation (see e.g., Davis et al, Basic Methods in Molecular Biology, [1986]). 

20 Alternatively, in some embodiments of the present disclosure, the polypeptides of the disclosure can be 
synthetically produced by conventional peptide synthesizers. 

Proteins can be expressed in mammalian cells, yeast, bacteria, or other cells under the control 
of appropriate promoters. Cell-free translation systems can also be employed to produce such proteins 
using RNAs derived from the DNA constructs of the present disclosure. Appropriate cloning and 

25 expression vectors for use with prokaryotic and eukaryotic hosts are described by Sambrook, et ah, 
Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, N.Y., [1989]. 

In some embodiments of the present disclosure, following transformation of a suitable host 
strain and growth of the host strain to an appropriate cell density, the selected promoter is induced by 
appropriate means (e.g., temperature shift or chemical induction) and cells are cultured for an 

30 additional period. In other embodiments of the present disclosure, cells are typically harvested by 
centrifugation, disrupted by physical or chemical means, and the resulting crude extract retained for 
further purification. In still other embodiments of the present disclosure, microbial cells employed in 
expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, 
sonication, mechanical disruption, or use of cell lysing agents. 

35 

3. Purification of RFX4_v3 

The present disclosure also provides methods for recovering and purifying RFX4_v3 from 
recombinant cell cultures including, but not limited to, ammonium sulfate or ethanol precipitation, acid 
extraction, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic 
40 interaction chromatography, affinity chromatography, hydroxylapatite chromatography and lectin 
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chromatography. In other embodiments of the present disclosure, protein refolding steps can be used 
as necessary, in completing configuration of the mature protein. In still other embodiments of the 
present disclosure, high performance liquid chromatography (HPLC) can be employed for final 
purification steps. 

5 The present disclosure further provides polynucleotides that can have the coding sequence 

fused in frame to a marker sequence, which allows for purification of the polypeptide of the present 
disclosure. A non-limiting example of a marker sequence is a hexahistidine tag which may be supplied 
by a vector, such as a pQE-9 vector, which provides for purification of the polypeptide fused to the 
marker in the case of a bacterial host, or, for example, the marker sequence may be a hemagglutinin 
1 0 (HA) tag when a mammalian host (e.g. , COS-7 cells) is used. The HA tag corresponds to an epitope 
derived from the influenza hemagglutinin protein (Wilson etal., Cell, 37:767 [1984]). 

4. Truncation Mutants of RFX4_v3 

In addition, the present disclosure provides fragments of RFX4_v3 (i.e., truncation mutants). 

15 In some embodiments of the present disclosure, when expression of a portion of the RFX4_v3 protein 
is desired, it may be necessary to add a start codon (ATG) to the oligonucleotide fragment containing 
the desired sequence to be expressed. It is well known in the art that a methionine at the N-terminal 
position can be enzymaticaily cleaved by the use of the enzyme methionine aminopeptidase (MAP). 
MAP has been cloned from E. coli (Ben-Bassat et ai, J. Bacteriol., 169:751 [1987]) and Salmonella 

20 typhimurium and its in vitro activity has been demonstrated on recombinant proteins (Miller et aL, 
Proc. Natl. Acad. Sci. USA 84:2718 [1990]). Therefore, removal of an N-terminal methionine, if 
desired, can be achieved either in vivo by expressing such recombinant polypeptides in a host which 
produces MAP (e.g., E. coli or CM89 or S. cerevisiae), or in vitro by use of purified MAP. 

25 5. Fusion Proteins Containing RFX4_v3 

The present disclosure also provides fusion proteins incorporating all or part of RFX4_v3. 
Accordingly, in some embodiments of the present disclosure, the coding sequences for the polypeptide 
can be incorporated as a part of a fusion gene including a nucleotide sequence encoding a different 
polypeptide. It is contemplated that this type of expression system will find use under conditions 

30 where it is desirable to produce an immunogenic fragment of a RFX4_v3 protein. In some 

embodiments of the present disclosure, the VP6 capsid protein of rotavirus is used as an immunologic 
carrier protein for portions of the RFX4_v3 polypeptide, either in the monomeric form or in the form of 
a viral particle. In other embodiments of the present disclosure, the nucleic acid sequences 
corresponding to the portion of RFX4_v3 against which antibodies are to be raised can be incorporated 

35 into a fusion gene construct which includes coding sequences for a late vaccinia virus structural protein 
to produce a set of recombinant viruses expressing fusion proteins comprising a portion of RFX4_v3 as 
part of the virion. It has been demonstrated with the use of immunogenic fusion proteins utilizing the 
hepatitis B surface antigen fusion proteins that recombinant hepatitis B virions can be utilized in this 
role as well. Similarly, in other embodiments of the present disclosure, chimeric constructs coding for 

40 fusion proteins containing a portion of RFX4_y3 and the poliovirus capsid protein are created to 
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enhance immunogenicity of the set of polypeptide antigens (see e.g., EP Publication No. 025949; and 
Evans et al, Nature, 339:385 [1989]; Huang et al, J. Virol., 62:3855 [1988]; and Schlienger et al, J. 
Virol, 66:2 [1992]). 

In still other embodiments of the present disclosure, the multiple antigen peptide system for 
5 peptide-based immunization can be utilized. In this system, a desired portion of RFX4_v3 is obtained 
directly from organo-chemical synthesis of the peptide onto an oligomeric branching lysine core (see 
e.g, Posnett et al.,J. Biol. Chem., 263:1719 [1988]; and Nardelli et al, J. Immunol, 148:914 [1992]). 
In other embodiments of the present disclosure, antigenic determinants of the RFX4_v3 proteins can 
also be expressed and presented by bacterial cells. 

10 In addition to utilizing fusion proteins to enhance immunogenicity, it is widely appreciated 

that fusion proteins can also facilitate the expression of proteins, such as the RFX4_v3 protein of the 
present disclosure. Accordingly, in some embodiments of the present disclosure, RFX4_v3 can be 
generated as a glutathione-S-transferase (le., GST fusion protein). It is contemplated that such GST 
fusion proteins will enable easy purification of RFX4_v3, such as by the use of glutathione-derivatized 

15 matrices (see e.g, Ausabel et al (eds.), Current Protocols in Molecular Biology, John Wiley & Sons, 
NY [1991]). In another embodiment of the present disclosure, a fusion gene coding for a purification 
leader sequence, such as a poly-(His)/enterokinase cleavage site sequence at the N-terminus of the 
desired portion of RFX4_v3, can allow purification of the expressed RFX4_v3 fusion protein by 

2+ 

affinity chromatography using a Ni metal resin. In still another embodiment of the present disclosure, 
20 the purification leader sequence can then be subsequently removed by treatment with enterokinase (see 
e.g, Hochuli et al, J. Chromatogr., 41 1:177 [1987]; and Janknecht et al, Proc. Natl Acad Sci. USA, 
88:8972). 

Techniques for making fusion genes are well known. Essentially, the joining of various DNA 
fragments coding for different polypeptide sequences is performed in accordance with conventional 

25 techniques, employing blunt-ended or stagger-ended termini for ligation, restriction enzyme digestion 
to provide for appropriate termini, filling-in of cohesive ends as appropriate, alkaline phosphatase 
treatment to avoid undesirable joining, and enzymatic ligation. In another embodiment of the present 
disclosure, the fusion gene can be synthesized by conventional techniques including automated DNA 
synthesizers. Alternatively, in other embodiments of the present disclosure, PCR amplification of gene 

30 fragments can be carried out using anchor primers which give rise to complementary overhangs 

between two consecutive gene fragments which can subsequently be annealed to generate a chimeric 
gene sequence (see e.g, Current Protocols in Molecular Biology, supra). 

6. Variants of RFX4_v3 

35 Still other embodiments of the present disclosure provide mutant or variant forms of RFX4_y3 

(le., muteins). It is possible to modify the structure of a peptide having an activity of RFX4__v3 for 
such purposes as enhancing therapeutic or prophylactic efficacy, or stability (e.g, ex vivo shelf life, 
and/or resistance to proteolytic degradation in vivo). Such modified peptides are considered functional 
equivalents of peptides having an activity of the subject RFX4_y3 proteins as defined herein. A 
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modified peptide can be produced in which the amino acid sequence has been altered, such as by amino 
acid substitution, deletion, or addition. 

Moreover, as described above, variant forms (e.g. , mutants) of the subject RFX4_v3 proteins 
are also contemplated as being equivalent to those peptides and DNA molecules that are set forth in 
5 more detail For example, as described above, the present disclosure encompasses mutant and variant 
proteins that contain conservative or non-conservative amino acid substitutions. 

This disclosure further contemplates a method of generating sets of combinatorial mutants of 
the present RFX4_y3 proteins, as well as truncation mutants, and is especially useful for identifying 
potential variant sequences {i.e., homologs) that are functional. The purpose of screening such 
10 combinatorial libraries is to generate, for example, novel RFX4_v3 homologs that can act as either 
agonists or antagonists, or alternatively, possess novel activities all together, such as a replacement 
therapy for a defective RFX4v3 transcript to prevent phenotypic expression of congenital 
hydrocephalus. 

Therefore, in some embodiments of the present disclosure, RFX4_v3 homologs are 

1 5 engineered by the present method to provide a more efficient transcription factor. In other 

embodiments of the present disclosure, combinatorially-derived homologs are generated which have a 
selective potency relative to a naturally occurring RFX4_y3. Such proteins, when expressed from 
recombinant DNA constructs, can be used in gene therapy protocols. 

Still other embodiments of the present disclosure provide RFX4_v3 homologs that have 

20 intracellular half-lives dramatically different than the corresponding wild-type protein. For example, 
the altered protein can be rendered either more stable or less stable to proteolytic degradation or other 
cellular process that result in destruction of, or otherwise inactivate RFX4_v3. Such homologs, and the 
genes which encode them, can be utilized to alter the location of RFX4_v3 expression by modulating 
the half-life of the protein. For instance, a short half-life can give rise to more transient RFX4_v3 

25 biological effects and, when part of an inducible expression system, can allow tighter control of 

RFX4_v3 levels within the cell. As above, such proteins, and particularly their recombinant nucleic 
acid constructs, can be used in gene therapy protocols. 

In still other embodiments of the present disclosure, RFX4_v3 homologs are generated by the 
combinatorial approach to act as antagonists, in that they are able to interfere with the ability of the 

30 corresponding wild-type protein to regulate cell function. These antagonists may be useful in the 
controlled production of animal models with dose-dependent manifestations of hydrocephalus for 
further study. 

In some embodiments of the combinatorial mutagenesis approach of the present disclosure, 
the amino acid sequences for a population of RFX4_v3 homologs or other related proteins are aligned, 
35 preferably to promote the highest homology possible. Such a population of variants can include, for 

example, RFX4_v3 homologs from one or more species, or RFX4_jv3 homologs from the same species 
but which differ due to mutation. Amino acids that appear at each position of the aligned sequences are 
selected to create a degenerate set of combinatorial sequences. 

In a preferred embodiment of the present disclosure, the combinatorial RFX4_v3 library is 
40 produced by way of a degenerate library of genes encoding a library of polypeptides which each 
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include at least a portion of potential RFX4__v3 protein sequences. For example, a mixture of synthetic 
oligonucleotides can be enzymatically ligated into gene sequences such that the degenerate set of 
potential RFX4_v3 sequences are expressible as individual polypeptides, or alternatively, as a set of 
larger fusion proteins (e.g., for phage display) containing the set of RFX4_v3 sequences therein. 
5 There are many ways by which the library of potential RFX4_v3 homologs can be generated 

from a degenerate oligonucleotide sequence. In some embodiments, chemical synthesis of a 
degenerate gene sequence is carried out in an automatic DNA synthesizer, and the synthetic genes are 
ligated into an appropriate gene for expression. The purpose of a degenerate set of genes is to provide, 
in one mixture, all of the sequences encoding the desired set of potential RFX4_v3 sequences. The 

10 synthesis of degenerate oligonucleotides is well known in the art (see e.g., Narang, Tetrahedron Lett., 
39:39 [1983]; Itakura et aU Recombinant DNA, in Walton (ed.), Proceedings of the 3rd Cleveland 
Symposium on Macromolecules, Elsevier, Amsterdam, pp 273-289 [1981]; Itakura et al, Annu. Rev. 
Biochem., 53:323 [1984]; Itakura et al, Science, 198:1056 [1984]; Ike et al, Nucl Acid Res., 1 1:477 
[1983]). Such techniques have been employed in the directed evolution of other proteins (see e.g., 

15 Scott et al., Science, 249:386 [1990]; Roberts et al, Proc. Natl Acad. Sci. USA, 89:2429 [1992]; 

Devlin et al, Science, 249: 404 [1990]; Cwirla et al, Proc. Natl Acad. Sci. USA, 87: 6378 [1990]; as 
well as U.S. Pat. Nos. 5,223,409; 5,198,346; and 5,096,815). 

It is contemplated that the RFX4_v3 nucleic acids (e.g., SEQ ID NOS: 5, 7 and 9, and 
fragments and variants thereof) can be utilized as starting nucleic acids for directed evolution. These 

20 techniques can be utilized to develop RFX4__v3 variants having desirable properties such as increased 
or decreased ability to compete with a naturally occurring defective transcript that induces congenital 
hydrocephalus. 

In some embodiments, artificial evolution is performed by random mutagenesis (e.g, by 
utilizing error-prone PCR to introduce random mutations into a given coding sequence). This method 

25 requires that the frequency of mutation be finely tuned. As a general rule, beneficial mutations are rare, 
while deleterious mutations are common. This is because the combination of a deleterious mutation 
and a beneficial mutation often results in an inactive enzyme. The ideal number of base substitutions 
for targeted gene is usually between 1 .5 and 5 (Moore and Arnold, Nat. Biotech., 14, 458 [1996]; 
Leung etal, Technique, 1:11 [1989]; Eckert and Kunkel, PCR Methods AppL, 1:17-24 [1991]; 

30 Caldwell and Joyce, PCR Methods AppL, 2:28 [1992]; and Zhao and Arnold, Nuc. Acids. Res., 25: 1307 
[1997]). After mutagenesis, the resulting clones are selected for desirable activity (e.g., screened for 
RFX4_v3 activity). Successive rounds of mutagenesis and selection are often necessary to develop 
enzymes with desirable properties. It should be noted that only the useful mutations are carried over to 
the next round of mutagenesis. 

35 In other embodiments of the present disclosure, the polynucleotides of the present disclosure 

are used in gene shuffling or sexual PCR procedures (e.g., Smith, Nature, 370:324 [1994]; U.S. Pat. 
Nos. 5,837,458; 5,830,721; 5,81 1,238; 5,733,731). Gene shuffling involves random fragmentation of 
several mutant DNAs followed by their reassembly by PCR into full length molecules. Examples of 
various gene shuffling procedures include, but are not limited to, assembly following DNase treatment, 

40 the staggered extension process (STEP), and random priming in vitro recombination. In the DNase 
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mediated method, DNA segments isolated from a pool of positive mutants are cleaved into random 
fragments with DNasel and subjected to multiple rounds of PCR with no added primer. The lengths of 
random fragments approach that of the uncleaved segment as the PCR cycles proceed, resulting in 
mutations present in different clones becoming mixed and accumulating in some of the resulting 
5 sequences. Multiple cycles of selection and shuffling have led to the functional enhancement of several 
enzymes (Stemmer, Nature, 370:398 [1994]; Stemmer, Proc. Natl Acad Sci. USA, 91:10747 [1994]; 
Crameri et al, Nat Biotech., 14:315 [1996]; Zhang et al, Proc. Natl Acad. Scl USA, 94:4504 [1997]; 
and Crameri et al, Nat Biotech., 15:436 [1997]). 

A wide range of techniques are known in the art for screening gene products of combinatorial 

10 libraries made by point mutations, and for screening cDNA libraries for gene products having a certain 
property. Such techniques will be generally adaptable for rapid screening of the gene libraries 
generated by the combinatorial mutagenesis or recombination of RFX4_v3 homologs. The most 
widely used techniques for screening large gene libraries typically comprises cloning the gene library 
into replicable expression vectors, transforming appropriate cells with the resulting library of vectors, 

15 and expressing the combinatorial genes under conditions in which detection of a desired activity 
facilitates relatively easy isolation of the vector encoding the gene whose product was detected. 

7. Chemical Synthesis of RFX4_v3 

In an alternate embodiment of the disclosure, the coding sequence of RPX4_v3 is synthesized, 

20 in whole or in part, using chemical methods well known in the art (see e.g., Caruthers et al, Nucl 

Acids Res. Symp. Ser., 7:215 [1980]; Crea and Horn, Nucl. Acids Res., 9:2331 [1980]; Matteucci and 
Caruthers, Tetrahedron Lett, 21:719 [1980]; and Chow and Kempe, Nucl Acids Res., 9:2807 [1981]). 
In other embodiments of the present disclosure, the protein itself is produced using chemical methods 
to synthesize either an entire RFX4_v3 amino acid sequence or a portion thereof. For example, 

25 peptides can be synthesized by solid phase techniques, cleaved from the resin, and purified by 

preparative high performance liquid chromatography (see e.g., Creighton, Proteins Structures And 
Molecular Principles, W H Freeman and Co, New York N.Y. [1983]). In other embodiments of the 
present disclosure, the composition of the synthetic peptides is confirmed by amino acid analysis or 
sequencing (see e.g., Creighton, supra). 

30 Direct peptide synthesis can be performed using various solid-phase techniques (Roberge et 

al, Science, 269:202 [1995]) and automated synthesis may be achieved, for example, using ABI 43 1A 
Peptide Synthesizer (Perkin Elmer) in accordance with the instructions provided by the manufacturer. 
Additionally, the amino acid sequence of RFX4_v3, or any part thereof, may be altered during direct 
synthesis and/or combined using chemical methods with other sequences to produce a variant 

35 polypeptide. 

ffl. Detection of RFX4_v3 Alleles 
A. RFX4_v3 Alleles 

In some embodiments, the present disclosure includes alleles of RFX4_v3 that increase a 
40 subject's susceptibility to congenital hydrocephalus (e.g., including, but not limited to, sequences 
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described above with the epoxygenase transgene insert). In some embodiments, subjects (e.g. 9 human 
subjects) with an increased susceptibility to congenital hydrocephalus are identified through the 
detection of mutant RFX4_v3 alleles. Any mutation that results in the undesired phenotype is within 
the scope of the present disclosure. 

For example, in some embodiments, the present disclosure provides single-nucleotide 
polymorphisms of RFX4_v3 that produce varying levels of expression of the congenital hydrocephalus 
phenotype compared to the wild-type sequence. 



B. Detection of RFX4_v3 Alleles 

10 Accordingly, the present disclosure provides methods for determining whether a subject has 

an increased susceptibility to congenital hydrocephalus by determining whether the individual has a 
mutated gene. In other embodiments, the present disclosure provides methods for providing a 
prognosis of increased risk for congenital hydrocephalus to an individual based on the presence or 
absence of one or more mutations. In some embodiments, the mutation is in the RFX4_v3 gene. In 

15 other embodiments, the mutation manifests as dose dependent congenital hydrocephalus. In some 

embodiments, the mutation is a single nucleotide polymorphism caused by an insertion of any number 
of residues or a single nucleotide substitution. In other embodiments, the mutation can result from 
multiple nucleotide polymorphisms caused by an insertion of any number of residues or a single 
nucleotide substitution into the RFX4_v3 transcript. 

20 In still further embodiments, the detection of polymorphisms is not limited to the RFX4_v3 

transcript. Since RFX4_vl and RFX4_v2 each have exons in common with RFX4 _y3 (see Fig. 2), 
detection of polymorphisms in any of the common exons provides additional methods for detecting an 
increased susceptibility to congenital hydrocephalus. 

To perform a diagnostic test for the presence or absence of a mutation in a RFX4 _y3 sequence 

25 of an individual, a suitable genomic DNA-containing sample from a subject is obtained and the DNA 
extracted using conventional techniques. For instance, a blood sample, a buccal swab, a hair follicle 
preparation, a nasal aspirate, a cerebral spinal fluid sample, or an amniotic fluid sample is used as a 
source of cells to provide the DNA sample. Similarly, a surgical specimen, such as a brain tissue 
biopsy, or other biological sample containing genomic DNA could be used. The extracted DNA is then 

30 subjected to amplification, for example according to standard procedures. The allele of the single base- 
pair mutation is determined by conventional methods including manual and automated fluorescent 
DNA sequencing, primer extension methods (Nikiforov, et al, Nucl Acids Res. 22:4167-4175, 1994), 
oligonucleotide ligation assay (OLA) (Nickerson etal, Proc. Natl. Acad. Sci. USA 87:8923-8927, 
1990), allele-specific PCR methods (Rust et al., Nucl. Acids Res. 6:3623-3629, 1993), RNase mismatch 

35 cleavage, single strand conformation polymorphism (SSCP), denaturing gradient gel electrophoresis 

(DGGE), Taq-Man™, oligonucleotide hybridization, and the like. Also, see the following U.S. Patents 
for descriptions of methods or applications of polymorphism analysis to disease prediction and/or 
diagnosis: 4,666,828 (RFLP for Huntington's); 4,801,531 (prediction of atherosclerosis); 5,1 10,920 
(HLA typing); 5,268,267 (prediction of small cell carcinoma); and 5,387,506 (prediction of 

40 dysautonomia). 
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In general, assays for detecting polymorphisms or mutations fall into several categories, 
including, but not limited to direct sequencing assays, fragment polymorphism assays, hybridization 
assays, and computer based data analysis. Protocols and commercially available kits or services for 
performing multiple variations of these assays are available. In some embodiments, assays are 
5 performed in combination or in hybrid (e.g, different reagents or technologies from several assays are 
combined to yield one assay). The following assays are useful in the present disclosure. 

1. Direct sequencing Assays 

In some embodiments of the present disclosure, polymorphisms are detected using a direct 
10 sequencing technique. In these assays, DNA samples are first isolated from a subject using any 

suitable method. In some embodiments, the RFX4_v3 gene or any part thereof is cloned into a suitable 
vector and amplified by growth in a host cell (e.g., a bacterium). In other embodiments, DNA in the 
RFX4_v3 gene or any part thereof is amplified using PCR. 

Following amplification, DNA in the RFX4_v3 gene or any part thereof (e.g., the region 
15 containing the polymorphism or mutation of interest) is sequenced using any suitable method, 

including but not limited to manual sequencing using radioactive marker nucleotides, or automated 
sequencing. The results of the sequencing are displayed using any suitable method. The sequence is 
examined and the presence or absence of a given polymorphism or mutation is determined. 

20 2. PCR Assay 

In some embodiments of the present disclosure, polymorphisms are detected using a PCR- 

based assay. In some embodiments, the PCR assay comprises the use of oligonucleotide primers that 

hybridize only to the mutant or wild-type allele of RFX4_v3 (e.g., to the region of polymorphism). 

Both sets of primers are used to amplify a sample of DNA. If only the mutant primers result in a PCR 
25 product, then the patient has the mutant RFX4_v3 allele. If only the wild-type primers result in a PCR 

product, then the patient has the wild-type allele of RFX4_v3. 

3. Fragment Length Polymorphism Assays 

In some embodiments of the present disclosure, polymorphisms are detected using a fragment 
30 length polymorphism assay. In a fragment length polymorphism assay, a unique DNA banding pattern 
based on cleaving the DNA at a series of positions is generated using an enzyme (e.g., a restriction 
enzyme or a CLEAVASE I [Third Wave Technologies, Madison, WI] enzyme). DNA fragments from 
a sample containing a polymorphism or a mutation will have a different banding pattern than wild-type. 

35 a. RFLP Assay 

In some embodiments of the present disclosure, polymorphisms are detected using a 
restriction fragment length polymorphism assay (RFLP). The RFX4_v3 gene or any part thereof is 
first isolated using PCR. The PCR products are then cleaved with restriction enzymes known to give a 
unique length fragment for a given polymorphism. The restriction-enzyme digested PGR products are 

40 separated by agarose gel electrophoresis and visualized by ethidium bromide staining. The length of 
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the fragments is compared to molecular weight markers and fragments generated from wild-type and 
mutant controls. 

b. CFLP Assay 

5 In other embodiments, polymorphisms are detected using a CLEAVASE fragment length 

polymorphism assay (CFLP; Third Wave Technologies, Madison, WI; see e.g., U.S. Patent Nos. 
5,843,654; 5,843,669; 5,719,208; and 5,888,780). This assay is based on the observation that when 
single strands of DNA fold on themselves, they assume higher order structures that are highly 
individual to the precise sequence of the DNA molecule. These secondary structures involve partially 

10 duplexed regions of DNA such that single stranded regions are juxtaposed with double stranded DNA 
hairpins. The CLEAVASE I enzyme is a structure-specific, thermostable nuclease that recognizes and 
cleaves the junctions between these single-stranded and double-stranded regions. 

The region of interest is first isolated, for example, using PCR. Then, DNA strands are 
separated by heating. Next, the reactions are cooled to allow intrastrand secondary structure to form. 

15 The PCR products are then treated with the CLEAVASE I enzyme to generate a series of fragments 
that are unique to a given polymorphism or mutation. The CLEAVASE enzyme treated PCR products 
are separated and detected {e.g., by agarose gel electrophoresis) and visualized (e.g., by ethidium 
bromide staining). The length of the fragments is compared to molecular weight markers and 
fragments generated from wild-type and mutant controls. 

20 

4. Hybridization Assays 

In preferred embodiments of the present disclosure, polymorphisms are detected in a 
hybridization assay. In a hybridization assay, the presence of absence of a given polymorphism or 
mutation is determined based on the ability of the DNA from the sample to hybridize to a 
25 complementary DNA molecule (e.g. 9 a oligonucleotide probe). A variety of hybridization assays using 
a variety of technologies for hybridization and detection are available. A description of a selection of 
assays is provided below. 

a. Direct Detection of Hybridization 

30 In some embodiments, hybridization of a probe to the sequence of interest (e.g. , a 

polymorphism or mutation) is detected directly by visualizing a bound probe (e.g., a Northern or 
Southern assay; see e.g. 9 Ausabel et ah (eds.), Current Protocols in Molecular Biology, John Wiley & 
Sons, NY [1991]). In these assays, genomic DNA (Southern) or RNA (Northern) is isolated from a 
subject. The DNA or RNA is then cleaved with a series of restriction enzymes that cleave infrequently 

35 in the genome and not near any of the markers being assayed. The DNA or RNA is then separated 
(e.g., on an agarose gel) and transferred to a membrane. A labeled (e.g., by incorporating a 
radionucleotide) probe or probes specific for the polymorphism or mutation being detected is allowed 
to contact the membrane under a condition or low, medium, or high stringency conditions. Unbound 
probe is removed and the presence of binding is detected by visualizing the labeled probe. 

40 
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b. Detection of Hybridization Using "DNA Chip" Assays 

In some embodiments of the present disclosure, polymorphisms are detected using a DNA 
chip hybridization assay. In this assay, a series of oligonucleotide probes are affixed to a solid support 
The oligonucleotide probes are designed to be unique to a given polymorphism or mutation. The DNA 
5 sample of interest is contacted with the DNA "chip" and hybridization is detected. 

In some embodiments, the DNA chip assay is a GeneChip (Afiymetrix, Santa Clara, CA; see 
e.g., U.S. Patent Nos. 6,045,996; 5,925,525; and 5,858,659) assay. The GeneChip technology uses 
miniaturized, high-density arrays of oligonucleotide probes affixed to a "chip." Probe arrays are 
manufactured by Affymetrix's light-directed chemical synthesis process, which combines solid-phase 

10 chemical synthesis with photolithographic fabrication techniques employed in the semiconductor 

industry. Using a series of photolithographic masks to define chip exposure sites, followed by specific 
chemical synthesis steps, the process constructs high-density arrays of oligonucleotides, with each 
probe in a predefined position in the array. Multiple probe arrays are synthesized simultaneously on a 
large glass wafer. The wafers are then diced, and individual probe arrays are packaged in injection- 

15 molded plastic cartridges, which protect them from the environment and serve as chambers for 
hybridization. 

The nucleic acid to be analyzed is isolated, amplified by PCR, and labeled with a fluorescent 
reporter group. The labeled DNA is then incubated with the array using a fluidics station. The array is 
then inserted into the scanner, where patterns of hybridization are detected. The hybridization data are 
20 collected as light emitted from the fluorescent reporter groups already incorporated into the target, 
which is bound to the probe array. Probes that perfectly match the target generally produce stronger 
signals than those that have mismatches. Since the sequence and position of each probe on the array 
are known, by complementarity, the identity of the target nucleic acid applied to the probe array can be 
determined. 

25 In other embodiments, a DNA microchip containing electronically captured probes (Nanogen, 

San Diego, CA) is utilized (see e.g., U.S. Patent Nos. 6,017,696; 6,068,818; and 6,051,380). Through 
the use of microelectronics, Nanogen's technology enables the active movement and concentration of 
charged molecules to and from designated test sites on its semiconductor microchip. DNA capture 
probes unique to a given polymorphism or mutation are electronically placed at, or "addressed" to, 

30 specific sites on the microchip. Since DNA has a strong negative charge, it can be electronically 
moved to an area of positive charge. 

First, a test site or a row of test sites on the microchip is electronically activated with a 
positive charge. Next, a solution containing the DNA probes is introduced onto the microchip. The 
negatively charged probes rapidly move to the positively charged sites, where they concentrate and are 

35 chemically bound to a site on the microchip. The microchip is then washed and another solution of 
distinct DNA probes is added until the array of specifically bound DNA probes is complete. 

A test sample is then analyzed for the presence of target DNA molecules by determining 
which of the DNA capture probes hybridize, with complementary DNA in the test sample (e.g., a PCR 
amplified RFX4_v3 gene). An electronic charge is also used to move and concentrate target molecules 

40 to one or more test sites on the microchip. The electronic concentration of sample DNA at each test 
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site promotes rapid hybridization of sample DNA with complementary capture probes (hybridization 
may occur in minutes). To remove any unbound or nonspecifically bound DNA from each site, the 
polarity or charge of the site is reversed to negative, thereby forcing any unbound or nonspecifically 
bound DNA back into solution away from the capture probes. A laser-based fluorescence scanner is 

5 used to detect binding, 

In still further embodiments, an array technology based upon the segregation of fluids on a flat 
surface (chip) by differences in surface tension (ProtoGene, Palo Alto, CA) is utilized (see e.g., U.S. 
Patent Nos. 6,001,3 1 1 ; 5,985,55 1 ; and 5,474,796). Protogene's technology is based on the fact that 
fluids can be segregated on a flat surface by differences in surface tension that have been imparted by 

10 chemical coatings. Once so segregated, oligonucleotide probes are synthesized directly on the chip by 
ink-jet printing of reagents. The array with its reaction sites defined by surface tension is mounted on a 
X/Y translation stage under a set of four piezoelectric nozzles, one for each of the four standard DNA 
bases. The translation stage moves along each of the rows of the array and the appropriate reagent is 
delivered to each of the reaction site. For example, the A amidite is delivered only to the sites where 

15 amidite A is to be coupled during that synthesis step and so on. Common reagents and washes are 
delivered by flooding the entire surface and then removing them by centrifiigation. 

DNA probes unique for the polymorphism or mutation of interest are affixed to the chip using 
Protogene's technology. The chip is then contacted with the PCR-amplified genes of interest. 
Following hybridization, unbound DNA is removed and hybridization is detected using any suitable 

20 method (e.g, by fluorescence de-quenching of an incorporated fluorescent group). 

In yet other embodiments, a "bead array" is used for the detection of polymorphisms 
(Illumina, San Diego, CA; see e.g, PCT Publications WO 99/67641 and WO 00/39587). Illumina uses 
a BEAD ARRAY technology that combines fiber optic bundles and beads that self-assemble into an 
array. Each fiber optic bundle contains thousands to millions of individual fibers depending on the 

25 diameter of the bundle. The beads are coated with an oligonucleotide specific for the detection of a 
given polymorphism or mutation. Batches of beads are combined to form a pool specific to the array. 
To perform an assay, the BEAD ARRAY is contacted with a prepared subject sample (e.g., DNA). 
Hybridization is detected using any suitable method. 

30 c. Enzymatic Detection of Hybridization 

In some embodiments of the present disclosure, genomic profiles are generated using an assay 
that detects hybridization by enzymatic cleavage of specific structures (INVADER assay, Third Wave 
Technologies; see e.g., U.S. Patent Nos. 5,846,717; 6,090,543; 6,001,567; 5,985,557; and 5,994,069; 
each of which is herein incorporated by reference). The INVADER assay detects specific DNA and 

35 RNA sequences by using structure-specific enzymes to cleave a complex formed by the hybridization 
of overlapping oligonucleotide probes. Elevated temperature and an excess of one of the probes enable 
multiple probes to be cleaved for each target sequence present without temperature cycling. These 
cleaved probes then direct cleavage of a second labeled probe. The secondary probe oligonucleotide 
can be 5 '-end labeled with fluorescein that is quenched by an internal dye. Upon cleavage, the de- 

40 quenched fluorescein labeled product may be detected using a standard fluorescence plate reader. 
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The INVADER assay detects specific mutations and polymorphisms in unamplified genomic 
DNA. The isolated DNA sample is contacted with the first probe specific either for a 
polymorphism/mutation or wild-type sequence and allowed to hybridize. Then a secondary probe, 
specific to the first probe, and containing the fluorescein label, is hybridized and the enzyme is added. 
Binding is detected by using a fluorescent plate reader and comparing the signal of the test sample to 
known positive and negative controls. 

In some embodiments, hybridization of a bound probe is detected using a TaqMan assay (PE 
Biosystems, Foster City, CA; see e.g., U.S. Patent Nos. 5,962,233 and 5,538,848). The assay is 
performed during a PCR reaction. The TaqMan assay exploits the 5'-3' exonuclease activity of the 
AMPLITAQ GOLD DNA polymerase. A probe, specific for a given allele or mutation, is included in 
the PCR reaction. The probe consists of an oligonucleotide with a 5 '-reporter dye (e.g. 9 a fluorescent 
dye) and a 3'-quencher dye. During PCR, if the probe is bound to its target, the 5'-3' nucleolytic 
activity of the AMPLITAQ GOLD polymerase cleaves the probe between the reporter and the 
quencher dye. The separation of the reporter dye from the quencher dye results in an increase of 
fluorescence. The signal accumulates with each cycle of PCR and can be monitored with a 
fluorometer. 

In still further embodiments, polymorphisms are detected using the SNP-IT primer extension 
assay (Orchid Biosciences, Princeton, NJ; see e.g. 9 U.S. Patent Nos. 5,952,174 and 5,919,626). In this 
assay, single nucleotide polymorphisms (SNPs) are identified by using a specially synthesized DNA 
primer and a DNA polymerase to selectively extend the DNA chain by one base at the suspected SNP 
location. DNA in the region of interest is amplified and denatured. Polymerase reactions are then 
performed using miniaturized systems called microfluidics. Detection is accomplished by adding a 
label to the nucleotide suspected of being at the SNP or mutation location. Incorporation of the label 
into the DNA can be detected by any suitable method (e.g., if the nucleotide contains a biotin label, 
detection is via a fluorescently labeled antibody specific for biotin). 

5. Mass Spectroscopy Assay 

In some embodiments, a MassARRAY system (Sequenom, San Diego, CA) is used to detect 
polymorphisms (see e.g., U.S. Patent Nos. 6,043,031; 5,777,324; and 5,605,798). DNA is isolated 
from blood samples using standard procedures. Next, specific DNA regions containing the mutation or 
SNP of interest, about 200 base pairs in length, are amplified by PCR. The amplified fragments are 
then attached by one strand to a solid surface and the non-immobilized strands are removed by standard 
denaturation and washing. The remaining immobilized single strand then serves as a template for 
automated enzymatic reactions that produce genotype specific diagnostic products. 

Very small quantities of the enzymatic products, typically five to ten nanoliters, are then 
transferred to a SpectroCHIP array for subsequent automated analysis with the SpectroREADER mass 
spectrometer. Each spot is preloaded with light absorbing crystals that form a matrix with the 
dispensed diagnostic product. The MassARRAY system uses MALDI-TOF (Matrix Assisted Laser 
Desorption Ionization - Time of Flight) mass spectrometry. In a process known as desorption, the 
matrix is hit with a pulse from a laser beam. Energy from the laser beam is transferred to the matrix 
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and it is vaporized resulting in a small amount of the diagnostic product being expelled into a flight 
tube. As the diagnostic product is charged when an electrical field pulse is subsequently applied to the 
tube they are launched down the flight tube towards a detector. The time between application of the 
electrical field pulse and collision of the diagnostic product with the detector is referred to as the time 
5 of flight This is a very precise measure of the product's molecular weight, as a molecule's mass 

correlates directly with time of flight with smaller molecules flying faster than larger molecules. The 
entire assay is completed in less than one thousandth of a second, enabling samples to be analyzed in a 
total of 3-5 seconds including repetitive data collection. The SpectroTYPER software then calculates, 
records, compares and reports the genotypes at the rate of three seconds per sample. 

10 

6. Mutant Analysis by Differential Differential Detection of RFX4_v3 Homologs 
With the provision herein of the unique N-terminus of human, mouse and zebrafish RFX4_v3 

homologs, it is now possible to design and/or construct specific binding molecules, such as nucleic acid 
probes or antibodies, to specifically identify RFX4_v3 homologs. Such RFX4_v3 -specific binding 

15 molecules are useful, for example, to distinguish RFX4_v3 homologs from related RFX4 variants (e.g., 
RFX4_vl and RFX4_v2). 

In some embodiments, antibodies that are utilized discriminate between mutant {Le. 9 truncated 
proteins) and wild-type proteins (SEQ ID NOS: 6, 8, and 10). In some other embodiments, the 
antibodies are directed to the C-terminus of RFX4_y3 or the N-terminus of RFX4_v3. In other 

20 embodiments, the antibodies are directed to the first 14 amino acids at the N-terminus of RFX4_v3 
(e.g., SEQ ID NOS: 33, 34 or 35). In certain embodiments, the antibodies are directed to the 
Reissner's fibers of the subcommissural organ. Production and use of RFX4v3 antibodies is 
discussed in detail above in the section entitled "Generation of RFX4_v3 Antibodies." 

In other embodiments, probes are used that discriminate between mutant (i.e., truncated 

25 proteins) and wild-type proteins (SEQ ID NOS: 6, 8, and 10). For example, in some embodiments 
probes are directed to the C-terminus of RFX4_v3 or the N-terminus of RFX4_v3. In other 
embodiments, probes are directed to the first 14 amino acids at the N-terminus of RFX4_v3 (eg., SEQ 
ID NOS: 33, 34 or 35). 

The preparation and use of nucleic acid probes are well-known in the art. For discussions of 
30 nucleic acid probe design and hybridization conditions, see, e.g., Molecular Cloning: A Laboratory 
Manual (2nd Ed.), Vols. 1-3, Sambrook, ed., Cold Spring Harbor Laboratory, (1989); Current 
Protocols In Molecular Biology, Ausubel, ed., John Wiley & Sons, Inc., New York (1997); Laboratory 
Techniques In Biochemistry And Molecular Biology: Hybridization With Nucleic Acid Probes, Part L 
Theory and Nucleic Acid Preparation, Tijssen, ed. Elsevier, N.Y. (1993). 

35 

7. Kits for Analyzing Risk of Congenital Hydrocephalus 

The present disclosure also provides kits for determining whether an individual contains a 
wild-type or mutant allele of RFX4_v3. In some embodiments, the kits are useful for determining 
whether the subject is at risk of passing on a defective RFX4_v3 gene resulting in children with 
40 congenital hydrocephalus. The diagnostic kits are produced in a variety of ways. In some 
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embodiments, the kits contain at least one reagent for specifically detecting a mutant RFX4_v3 allele or 
protein. In preferred embodiments, the reagent is a nucleic acid that hybridizes to nucleic acids 
containing a polymorphism and does not bind to nucleic acids that do not contain a polymorphism. In 
other preferred embodiments, the reagents are primers for amplifying the region of DNA containing a 
5 polymorphism. In still other embodiments, the reagents are antibodies that preferentially bind either 
the wild-type or truncated RFX4_v3 proteins. In some embodiments, the kit contains instructions for 
determining whether the subject is a carrier of a defective RFX4_v3 gene (eg., instructions required by 
the regulations for in vitro diagnostic products). In preferred embodiments, the instructions specify that 
by detecting the presence or absence of a mutant RFX4v3 allele in the subject, subjects having an 

10 allele containing a mutation have an increased risk of passing that mutated gene to their children, which 
may result in congenital hydrocephalus. In some embodiments, the kits include ancillary reagents such 
as buffering agents, nucleic acid stabilizing reagents, protein stabilizing reagents, and signal producing 
systems (eg., florescence generating systems as Fret systems). The test kit may be packaged in any 
suitable manner, typically with the elements in a single container or various containers as necessary 

15 along with a sheet of instructions for carrying out the test. In some embodiments, the kits also 
preferably include a negative control sample. 

8. Bioinformatics 

In some embodiments, the present disclosure provides methods of determining whether and 
20 individual carries a defective RFX4_v3 allele. In some embodiments, the analysis of polymorphism 
data is automated. For example, in some embodiments, the present disclosure provides a 
bioinformatics research system comprising a plurality of computers running a mulit-platform object 
oriented programming language (see eg., U.S. Patent 6,125,383). In some embodiments, one of the 
computers stores genetics data (eg., the severity of the congenital hydrocephalus with a given 
25 polymorphism). In some embodiments, one of the computers stores application programs (eg., for 
analyzing transmission disequilibria data or determining genotype relative risks and population 
attributable risks). Results are then delivered to the user (eg., via one of the computers or via the 
internet). 

30 IV. Generation of RFX4_v3 Antibodies 

Antibodies can be generated to allow for the specific detection of RFX4_v3 protein. The 
antibodies may be prepared using various immunogens. In one embodiment, the immunogen is a 
RFX4_v3 peptide to generate antibodies that recognize human and non-human RFX4_v3, but not 
RFX4_vl or RFX4jv2. Such antibodies include, but are not limited to polyclonal, monoclonal, 

35 chimeric, single chain, Fab fragments, and Fab expression libraries. 

Various procedures known in the art may be used for the production of polyclonal antibodies 
directed against RFX4_v3. For the production of antibody, various host animals can be immunized by 
injection with the peptide corresponding to a RFX4_v3 epitope including but not limited to rabbits, 
mice, rats, sheep, goats, etc. In a preferred embodiment, the peptide is conjugated to an immunogenic 

40 carrier (eg, diphtheria toxoid, bovine serum albumin (BSA), or keyhole limpet hemocyanin (KLH)). 
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Various adjuvants may be used to increase the immunological response, depending on the host species, 
including but not limited to Freund's (complete and incomplete), mineral gels (e.g., aluminum 
hydroxide), surface active substances (eg., lysolecithin, pluronic polyols, polyanions, peptides, oil 
emulsions, keyhole limpet hemocyanins, dinitrophenol, and potentially useful human adjuvants such as 

5 BCG (Bacille Calmette-Guerin) and Corynebacteriwn parvum). 

For preparation of monoclonal antibodies directed toward RFX4_v3, it is contemplated that 
any technique that provides for the production of antibody molecules by continuous cell lines in culture 
will find use with the present disclosure (see e.g., Harlow and Lane, Antibodies: A Laboratory Manual, 
Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY). These include but are not limited to 

10 the hybridoma technique originally developed by Kohler and Milstein (Kohler and Milstein, Nature, 
256:495-497 [1975]), as well as the trioma technique, the human B-cell hybridoma technique (see e.g., 
Kozbor et al. , Immunol. Tod., 4:72 [1983]), and the EBV-hybridoma technique to produce human 
monoclonal antibodies (Cole et al., "The EBV-hybridoma technique and its application to human lung 
cancer," in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96 [1985]). 

15 In an additional embodiment of the disclosure, monoclonal antibodies are produced in germ- 

free animals utilizing technology such as that described in PCT/US90/02545. Furthermore, it is 
contemplated that human antibodies will be generated by human hybridomas (Cote et al, Proc. Natl. 
Acad. Sci. USA, 80:2026-2030 [1983]) or by transforming human B cells with EBV virus in vitro (Cole 
et al, "The EBV-hybridoma technique and its application to human lung cancer," in Monoclonal 

20 Antibodies and Cancer Therapy, Alan R. Liss, pp. 77-96 [1985]). 

In addition, it is contemplated that techniques described for the production of single chain 
antibodies (U.S. Patent 4,946,778) will find use in producing RFX4_v3 specific single chain 
antibodies. An additional embodiment of the disclosure utilizes the techniques described for the 
construction of Fab expression libraries (Huse et al, Science, 246:1275-1281 [1989]) to allow rapid 

25 and easy identification of monoclonal Fab fragments with the desired specificity for RFX4_v3. 

It is contemplated that any technique suitable for producing antibody fragments will find use 
in generating antibody fragments that contain the idiotype (antigen binding region) of the antibody 
molecule. For example, such fragments include but are not limited to: F(ab')2 fragments that can be 
produced by pepsin digestion of the antibody molecule; Fab' fragments that can be generated by 

30 reducing the disulfide bridges of the F(ab')2 fragment, and Fab fragments that can be generated by 
treating the antibody molecule with papain and a reducing agent. 

In the production of antibodies, it is contemplated that screening for the desired antibody will 
be accomplished by techniques known in the art (e.g., radioimmunoassay, ELISA (enzyme-linked 
immunosorbant assay), "sandwich" immunoassays, immunoradiometric assays, gel diffusion 

35 precipitation reactions, immunodiffusion assays, in situ immunoassays (e.g., using colloidal gold or 
enzyme or radioisotope labels), Western blots, precipitation reactions, agglutination assays (e.g., gel 
agglutination assays, hemagglutination assays, etc.), complement fixation assays, immunofluorescence 
assays, protein A assays, and immunoelectrophoresis assays, etc. 

In one embodiment, antibody binding is detected by detecting a label on the primary antibody. 

40 In another embodiment, the primary antibody is detected by detecting binding of a secondary antibody 
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or reagent to the primary antibody. In a further embodiment, the secondary antibody is labeled. Many 
means are known in the art for detecting binding in an immunoassay and are within the scope of the 
present disclosure. (As is well known in the art, the immunogenic peptide should be provided free of 
the carrier molecule used in any immunization protocol. For example, if the peptide was conjugated to 

5 KLH, it may be conjugated to BSA, or used directly, in a screening assay.) 

The foregoing antibodies can be used in methods known in the art relating to the localization 
and structure of RFX4_v3 (e.g., for Western blotting), measuring levels thereof in appropriate 
biological samples, etc. The antibodies can be used to detect RFX4_v3 in a biological sample from an 
individual. The biological sample can be a biological fluid, such as, but not limited to, blood, serum, 

10 plasma, interstitial fluid, urine, cerebrospinal fluid, amniotic fluid and the like, containing cells. 

The biological samples can then be tested directly for the presence of human RFX4jv3 using 
an appropriate strategy (eg., ELISA or radioimmunoassay) and format (e.g., microwells, dipstick (e.g., 
as described in International Patent Publication WO 93/03367), etc. Alternatively, proteins in the 
sample can be size separated (e.g., by polyacrylamide gel electrophoresis (PAGE), in the presence or 

15 not of sodium dodecyl sulfate (SDS), and the presence of RFX4_v3 detected by immunoblotting 

(Western blotting). Immunoblotting techniques are generally more effective with antibodies generated 
against a peptide corresponding to an epitope of a protein, and hence, are particularly suited to the 
present disclosure. 

20 V. Gene Therapy Using RFX4_v3 

The present disclosure also provides methods and compositions suitable for gene therapy to 
alter RFX4_v3 expression, production, or function. As described above, the present disclosure 
provides human RFX4_v3 genes and provides methods of obtaining RFX4_v3 genes from other 
species. Thus, the methods described below are generally applicable across many species. In some 

25 embodiments, it is contemplated that gene therapy is performed by providing a subject with a wild-type 
allele of RFX4_v3. Subjects in need of such therapy are identified by the methods described above. 
As described above, RFX4_v3 is primarily expressed in the brain. Accordingly, a preferred method of 
gene therapy is to replace the defective transcript with wild-type RFX4_v3. 

Viral vectors commonly used for in vivo or ex vivo targeting and therapy procedures are DNA- 

30 based vectors and retroviral vectors. Methods for constructing and using viral vectors are known in the 
art (see e.g., Miller and Rosman, BioTech., 7:980-990 [1992]). Preferably, the viral vectors are 
replication defective, that is, they are unable to replicate autonomously in the target cell. In general, 
the genome of the replication defective viral vectors that are used within the scope of the present 
disclosure lack at least one region that is necessary for the replication of the virus in the infected cell. 

35 These regions can either be eliminated (in whole or in part), or be rendered non-functional by any 

technique known to a person skilled in the art. These techniques include the total removal, substitution 
(by other sequences, in particular by the inserted nucleic acid), partial deletion or addition of one or 
more bases to an essential (for replication) region. Such techniques may be performed in vitro (Le., on 
the isolated DNA) or in situ, using the techniques of genetic manipulation or by treatment with 

40 mutagenic agents. 
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Preferably, the replication defective virus retains the sequences of its genome that are 
necessary for encapsidating the viral particles. DNA viral vectors include an attenuated or defective 
DNA viruses, including, but not limited to, herpes simplex virus (HSV), papillomavirus, Epstein Barr 
virus (EBV), adenovirus, adeno-associated virus (AAV), and the like. Defective viruses, that entirely 
or almost entirely lack viral genes, are preferred, as defective virus is not infective after introduction 
into a cell. Use of defective viral vectors allows for administration to cells in a specific, localized area, 
without concern that the vector can infect other cells. Thus, a specific tissue can be specifically 
targeted. Examples of particular vectors include, but are not limited to, a defective herpes virus 1 
(HSV1) vector (Kaplitt et al, Mol Cell. Neuroscl, 2:320-330 [1991]), defective herpes virus vector 
lacking a glycoprotein L gene (see e.g, Patent Publication RD 371005 A), or other defective herpes 
virus vectors (see e.g., WO 94/21807; and WO 92/05263); an attenuated adenovirus vector, such as the 
vector described by Stratford-Perricaudet etal (J. Clin. Invest, 90:626-630 [1992]; see also, La Salle 
et al, Science, 259:988-990 [1993]); and a defective adeno-associated virus vector (Samulski et al, J. 
Virol, 61:3096-3101 [1987]; Samulski et al, J. Virol, 63:3822-3828 [1989]; and Lebkowski etal, 
Mol Cell BioL, 8:3988-3996 [1988]). 

Preferably, for in vivo administration, an appropriate immunosuppressive treatment is 
employed in conjunction with the viral vector (e.g, adenovirus vector), to avoid immuno-deactivation 
of the viral vector and transfected cells. For example, immunosuppressive cytokines, such as 
interleukin-12 (IL-12), interferon-gamma (IFN-y), or anti-CD4 antibody, can be administered to block 
humoral or cellular immune responses to the viral vectors. In addition, it is advantageous to employ a 
viral vector that is engineered to express a minimal number of antigens. 

In a preferred embodiment, the vector is an adenovirus vector. Adenoviruses are eukaryotic 
DNA viruses that can be modified to efficiently deliver a nucleic acid of the disclosure to a variety of 
cell types. Various serotypes of adenovirus exist. Of these serotypes, preference is given, within the 
scope of the present disclosure, to type 2 or type 5 human adenoviruses (Ad 2 or Ad 5), or adenoviruses 
of animal origin (see e.g., W094/26914). Those adenoviruses of animal origin that can be used within 
the scope of the present disclosure include adenoviruses of canine, bovine, murine (e.g., Mavl, Beard 
etal, Virol, 75-81 [1990]), ovine, porcine, avian, and simian (e.g, SAV) origin. 

Preferably, the replication of defective adenoviral vectors of the disclosure comprises ITRs, an 
encapsidation sequence and the nucleic acid of interest. Still more preferably, at least the El region of 
the adenoviral vector is non-fiinctional. The deletion in the El region preferably extends from 
nucleotides 455 to 3329 in the sequence of the Ad5 adenovirus (Pvull-BgM fragment) or 382 to 3446 
(HinfllSau3A fragment). Other regions may also be modified, in particular the E3 region (e.g., 
WO95/02697), the E2 region (e.g., W094/28938), the E4 region (e.g., W094/28152, W094/12649 and 
WO95/02697), or in any of the late genes L1-L5. 

In a preferred embodiment, the adenoviral vector has a deletion in the El region (Ad 1.0). 
Examples of El-deleted adenoviruses are disclosed in EP 185,573, the contents of which are 
incorporated herein by reference. In another preferred embodiment, the adenoviral vector has a 
deletion in the El and E4 regions (Ad 3.0). Examples of El/E4-deleted adenoviruses are disclosed in 
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WO95/02697 and W096/22378. In still another preferred embodiment, the adenoviral vector has a 
deletion in the El region into which the E4 region and the nucleic acid sequence are inserted. 

The replication defective recombinant adenoviruses according to the disclosure can be 
prepared by any technique known to the person skilled in the art (see e.g., Levrero et al , Gene, 101 : 195 

5 [1991]; EP 185 573; and Graham, EMBOJ., 3:2917 [1984]). In particular, they can be prepared by 
homologous recombination between an adenovirus and a plasmid, which carries inter alia, the DNA 
sequence of interest. The homologous recombination is accomplished following co-transfection of the 
adenovirus and plasmid into an appropriate cell line. The cell line that is employed should preferably 
(i) be transformable by the elements to be used, and (ii) contain the sequences that are able to 

10 complement the part of the genome of the replication defective adenovirus, preferably in integrated 
form in order to avoid the risks of recombination. Examples of cell lines that may be used are the 
human embryonic kidney cell line 293 (Graham et al,J. Gen, Virol, 36:59 [1977]), which contains the 
left-hand portion of the genome of an Ad5 adenovirus (12%) integrated into its genome, and cell lines 
that are able to complement the El and E4 functions, as described in applications W094/26914 and 

15 WO95/02697. Recombinant adenoviruses are recovered and purified using standard molecular 
biological techniques, which are well known to one of ordinary skill in the art. 

The adeno-associated viruses (AAV) are DNA viruses of relatively small size that can 
integrate, in a stable and site-specific manner, into the genome of the cells that they infect. They are 
able to infect a wide spectrum of cells without inducing any effects on cellular growth, morphology or 

20 differentiation, and they do not appear to be involved in human pathologies. The AAV genome has 
been cloned, sequenced and characterized. It encompasses approximately 4700 bases and contains an 
inverted terminal repeat (ITR) region of approximately 145 bases at each end, which serves as an origin 
of replication for the virus. The remainder of the genome is divided into two essential regions that 
carry the encapsidation functions: the left-hand part of the genome, that contains the rep gene involved 

25 in viral replication and expression of the viral genes; and the right-hand part of the genome, that 
contains the cap gene encoding the capsid proteins of the virus. 

The use of vectors derived from the AAVs for transferring genes in vitro and in vivo has been 
described (see e.g., WO 91/18088; WO 93/09239; US Pat No. 4,797,368; US Pat. No., 5,139,941; and 
EP 488 528). These publications describe various AAV-derived constructs in which the rep and/or cap 

30 genes are deleted and replaced by a gene of interest, and the use of these constructs for transferring the 
gene of interest in vitro (into cultured cells) or in vivo (directly into an organism). The replication 
defective recombinant AAVs according to the disclosure can be prepared by co-transfecting a plasmid 
containing the nucleic acid sequence of interest flanked by two AAV inverted terminal repeat (ITR) 
regions, and a plasmid carrying the AAV encapsidation genes (rep and cap genes), into a cell line that 

35 is infected with a human helper virus (for example an adenovirus). The AAV recombinants that are 
produced are then purified by standard techniques. 

In another embodiment, the gene can be introduced in a retroviral vector (e.g., as described in 
U.S. Pat. Nos. 5,399,346; 4,650,764; 4,980,289; and 5,124,263; Mann et al, Cell, 33:153 [1983]; 
Markowitz et al., J. Virol, 62:1 120 [1988]; PCT/US95/14575; EP 453242; EP178220; Bernstein et al 

40 Genet. Eng., 7:235 [1985]; McCormick, BioTechnol, 3:689 [1985]; WO 95/07358; and Kuo et al, 



51 



WO 03/088919 




PCT/US03/12348 



Blood, 82:845 [1993]). The retroviruses are integrating viruses that infect dividing cells. The 
retrovirus genome includes two LTRs, an encapsidation sequence and three coding regions (gag, pol 
and env). In recombinant retroviral vectors, the gag, pol and env genes are generally deleted, in whole 
or in part, and replaced with a heterologous nucleic acid sequence of interest These vectors can be 
5 constructed from different types of retrovirus, such as, HIV, MoMuLV ("murine Moloney Leukaemia 
Virus" MSV ("murine Moloney Sarcoma Virus"), HaSV ("Harvey Sarcoma Virus"); SNV ("Spleen 
Necrosis Virus"); RSV ("Rous Sarcoma Virus") and Friend virus. Defective retroviral vectors are also 
disclosed in WO95/02697. 

In general, in order to construct recombinant retroviruses containing a nucleic acid sequence, a 

10 plasmid is constructed that contains the LTRs, the encapsidation sequence and the coding sequence. 
This construct is used to transfect a packaging cell line, which cell line is able to supply in trans the 
retroviral functions that are deficient in the plasmid. In general, the packaging cell lines are thus able 
to express the gag pol and env genes. Such packaging cell lines have been described in the prior art, in 
particular the cell line PA317 (US Pat. No. 4,861,719), the PsiCRIP cell line (see, WO90/02806), and 

15 the GP+envAm-12 cell line (see, WO89/07150). In addition, the recombinant retroviral vectors can 
contain modifications within the LTRs for suppressing transcriptional activity as well as extensive 
encapsidation sequences that may include a part of the gag gene (Bender et ai , J. Virol,, 6 1 : 1 639 
[1987]). Recombinant retroviral vectors are purified by standard techniques known to those having 
ordinary skill in the art. 

20 Alternatively, the vector can be introduced in vivo by lipofection. For the past decade, there 

has been increasing use of liposomes for encapsulation and transfection of nucleic acids in vitro. 
Synthetic cationic lipids designed to limit the difficulties and dangers encountered with liposome 
mediated transfection can be used to prepare liposomes for in vivo transfection of a gene encoding a 
marker (Feigner et al, Proc. Natl Acad. ScL USA, 84:7413-7417 [1987]; see also, Mackey, et al. 9 

25 Proc. Natl. Acad. Sci. USA, 85:8027-8031 [1988]; Ulmer et al, Science, 259:1745-1748 [1993]). The 
use of cationic lipids may promote encapsulation of negatively charged nucleic acids, and also promote 
fusion with negatively charged cell membranes (Feigner and Ringold, Science, 337:387-388 [1989]). 
Particularly useful lipid compounds and compositions for transfer of nucleic acids are described in 
W095/18863 and W096/17823, and in U.S. Pat. No. 5,459,127. 

30 Other molecules are also useful for facilitating transfection of a nucleic acid in vivo, such as a 

cationic oligopeptide (e.g., W095/21931), peptides derived from DNA binding proteins (e.g., 
WO96/25508), or a cationic polymer (e.g., W095/21931). 

It is also possible to introduce the vector in vivo as a naked DNA plasmid. Methods for 
formulating and administering naked DNA to mammalian muscle tissue are disclosed in U.S. Pat Nos. 

35 5,580,859 and 5,589,466. 

DNA vectors for gene therapy can be introduced into the desired host cells by methods known 
in the art, including but not limited to transfection, electroporation, microinjection, transduction, cell 
fusion, DEAE dextran, calcium phosphate precipitation, use of a gene gun, or use of a DNA vector 
transporter (see e.g., Wu etal, J. Biol. Chem., 267:963 [1992]; Wu and Wu, J. Biol. Chem., 263:14621 

40 [1988]; and Williams et al, Proc. Natl. Acad. Sci. USA, 88:2726 [1991]). Receptor-mediated DNA 
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delivery approaches can also be used (Curiel et al, Hum. Gene Ther., 3: 147 [1992]; and Wu and Wu, J. 
Biol Chem., 262:4429 [1987]). 

VI. Transgenic Animals Expressing Heterologous RFX4_v3 Genes and Homologs, 

5 Mutants, and Variants Thereof 

A line of transgenic mice that lacks RFX4_v3 was generated by a transgene insertion within 
the last intron of die RFX4 gene. Targeted insertional mutagenesis in mice has become a standard 
method for uncovering the roles of a specific gene in development. However, several instances of 
accidental insertion of a transgene into a critical genomic locus have yielded important information as 

10 well. For example, a Reeler-like phenotype was observed in one line of transgenic mice harboring an 
unrelated transgene (Miao, et al, Natl Acad. Set USA, 91:1 1050-4 [1994]) herein incorporated by 
reference. The transgene had interrupted what is now known as the Reeler locus, and much has since 
been learned about the function of this gene and its gene product, reelin, in regulating the development 
of the central nervous system (D'Arcangelo, etal, Nature, 374:719-23, [1995]; D'Arcangelo, et al, 

15 Brain Res. Mol Brain Res., 39:234-6, [1996]; Rice and Curran, Annu. Rev. NeuroscL, 24:1005-39 
[2001]). Several other examples have been described recently (Friedman, et al, Laryngoscope, 
110:489-96, [2000]; Durkin, etal, Genomics, 73:20-7, [2001]; Overbeek, Genesis, 30:26-35, [2001]). 

The phenotypes of the transgenic mice were dosage-dependent: brains from heterozygous 
mice expressed approximately 50% of normal levels of brain-specific transcript, and exhibited 

20 universal, severe congenital hydrocephalus. This obstructive hydrocephalus appeared to be secondary 
to failure of development of the subcommissural organ (SCO), a structure that is important for the 
patency of the aqueduct of Sylvius and normal cerebrospinal fluid flow in the brain (Perez-Figares, et 
al, Microsc. Res. Tech., 52:591-607 [2001]; Rodriguez, etal, Microsc. Res. Tech., 52:573-90 [2001]; 
Vio, et al, Exp. Brain Res., 135:41-52 [2000]; Perez-Figares, et al, J. Neuropathol. Exp. Neurol., 

25 57:188-202 [1998]; Rodriguez, et al, Microsc. Res. Tech., 41:98-123 [1998]; Cifuentes, et al, Exp. 
Brain Res., 98:431-40 [1994]). The heterozygous condition was compatible with life and fertility in 
some cases. 

A single transgene insertion was demonstrated by Southern blotting of genomic DNA from 
affected mice. PCR-based techniques revealed that the inserted transgene consisted of at least 15 kb of 

30 foreign DNA, representing at least two tandem copies of the original 7.5 kb transgene. Using a 

GENOMEWALKER (BD Biosciences, Palo Alto, CA) approach with genomic DNA from transgenic 
mice, the 5' and 3' genomic sequences adjacent to the transgene insertional site was identified. These 
sequences were matched to incomplete mouse genomic sequences in GenBank. The mouse genomic 
sequences are highly related to a human chromosome 12 sequence. A BAC contig containing the 

35 human chromosome 12 sequence was analyzed for expressed sequences. All exons of the human 

winged helix protein RFX4, a testis-specific transcript (Morotomi-Yano etal, J. Biol Chem., 277:836- 
842 [2002] herein incorporated by reference), was found over a genomic region of nearly 100 kb. 

Using probes derived from the junctions between the inserted transgene and the endogenous 
mouse genomic DNA, the wild-type (+/+) and transgene-interrupted alleles were distinguished by both 

40 Southern blotting and PCR-based approaches. Southern blot indicated additional bands present in 
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heterozygous mice that were not present in wild-type mice. The Southern blot was hybridized with a 
transgene specific probe. PCR were also used to identify wild-type (+/+), heterozygous (+/-), and 
homozygous (-/-) mutant mice. PCR reactions were performed with primer pairs that either spanned 
the transgene insertion site or were transgene specific. Both approaches revealed the presence of both 
5 wild-type and "knockout" alleles in all of the affected mice. 

Despite having severe hydrocephalus, significant proportions of both male and female mice 
survived to adulthood and were fertile. Interbreeding heterozygous (+/-) mice resulted in the birth of 
live pups with the homozygous (-/-) genotype, but these pups died shortly after birth and had obvious 
brain malformations. Investigation of fetal mice showed that homozygous mice exhibited severe brain 

10 malformations at embryonic (E) days 18.5 and 16.5. Mice at E12.5 had more orderly and characteristic 
brain structures, but these mice also exhibited severe brain malformations. The characteristic 
obstructive midline brain malformation was seen in all homozygous mice embryos examined. 

To confirm that transgene insertion could prevent expression of a full-length RFX4 transcript 
in brain, Northern blots from brains of neonatal wild-type, heterozygous, and homozygous mice were 

15 probed with a mouse EST cDNA clone that was highly related to the putative final exon of the human 
cDNA and genomic sequence (Morotomi-Yano et al, J. Biol Chem. 9 277:836-842 [2002]). The EST 
probe revealed expression of a transcript of approximately 4 kb in brain, whereas a smaller transcript of 
about 3 kb was revealed in testis and liver. The brains from heterozygous mice expressed 
approximately 50% of the normal complement of the 4 kb transcript, whereas the homozygous mice 

20 expressed no detectable transcript of this size. 

Heterozygous mice appear to have a higher than normal in utero mortality rate. Many appear 
normal morphologically and behaviorally, although these mice were shown to have histological 
evidence of hydrocephalus. Some of these mice survived to adulthood and were fertile. 
Hydrocephalus was externally obvious in many of the heterozygous mice within 4-8 weeks after birth. 

25 Some mice with obvious hydrocephalus developed rapid neurological deterioration and died within a 
few days. 

Histologically in heterozygous mice, the hydrocephalus was apparent in the third and lateral 
ventricles. In addition, there was dilatation of the olfactory ventricles seen at the time of birth. 
Anatomically, examination revealed the absence or near absence of the subcommissural organ (SCO). 

30 This organ is thought to be critical for the maintenance of cerebrospinal fluid (CSF) flow through the 
aqueduct of Sylvius; ablation by various techniques leads to hydrocephalus (Perez-Figares, et al y 
Microsc. Res. Tech. y 52:591-601 [2001] herein incorporated by reference). The absence of this organ 
was detectable by routine histological staining. Upon antibody staining, using antibodies specific for 
the Reissner's fibers that comprise this organ, the staining of the heterozygous mice was lower than 

35 compared to wild-type mice. A small amount of antibody staining could be detected occasionally in 
the SCO region of the heterozygous mice, demonstrating that the molecular pathways leading to the 
production of the Reissner's fiber proteins is present, if underused, in the heterozygous animals. 

The present disclosure also contemplates the generation of additional transgenic animals, 
including but not limited to mice, comprising an exogenous RFX4_v3 gene or homologs, mutants, or 

40 variants thereof. In preferred embodiments, the transgenic animal displays an altered phenotype as 
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compared to wild-type animals. In some embodiments, the altered phenotype is the abnormal 
expression of mRNA for a RFX4_v3 gene as compared to wild-type levels of RFX4_v3 expression. 
Methods for analyzing the presence or absence of such phenotypes include Northern blotting, mRNA 
protection assays, and RT-PCR. In other embodiments, the transgenic animals have a knock out 
5 mutation of the RFX4_v3 gene. In still further embodiments, transgenic animals have expression of a 
RFX4jv3 variant gene. In preferred embodiments, the transgenic animals display a congenital 
hydrocephalus phenotype. 

In other embodiments, test compounds (e.g, a drug or other exogenous agent that is suspected 
of being useful to treat congenital hydrocephalus) and control compounds (e.g, a placebo) are 

10 administered to the transgenic animals and the control animals and the effects evaluated. 

The transgenic animals can be generated via a variety of methods, including, but not limited to 
the method described above. In some embodiments, embryonic cells at various developmental stages 
are used to introduce transgenes for the production of transgenic animals. Different methods are used 
depending on the stage of development of the embryonal cell. The zygote is the best target for micro- 

15 injection. In the mouse, the male pronucleus reaches the size of approximately 20 micrometers in 
diameter which allows reproducible injection of 1-2 picoliters (pi) of DNA solution. The use of 
zygotes as a target for gene transfer has a major advantage in that in most cases the injected DNA will 
be incorporated into the host genome before the first cleavage (Brinster et al, Proc. Natl. Acad. Sci. 
USA, 82:4438-4442 [1985]). As a consequence, all cells of the transgenic non-human animal will carry 

20 the incorporated transgene. This will in general also be reflected in the efficient transmission of the 

transgene to offspring of the founder since 50% of the germ cells will harbor the transgene. U.S. Patent 
No. 4,873,191 describes a method for the micro-injection of zygotes. 

In other embodiments, retroviral infection is used to introduce transgenes into a non-human 
animal. In some embodiments, the retroviral vector is utilized to transfect oocytes by injecting the 

25 retroviral vector into the perivitelline space of the oocyte (U.S. Pat. No. 6,080,912). In other 

embodiments, the developing non-human embryo can be cultured in vitro to the blastocyst stage. 
During this time, the blastomeres can be targets for retroviral infection (Janenich, Proc. Natl. Acad. ScL 
USA, 73:1260 [1976]). Efficient infection of the blastomeres is obtained by enzymatic treatment to 
remove the zona pellucida (Hogan et al, in Manipulating the Mouse Embryo, Cold Spring Harbor 

30 Laboratory Press, Cold Spring Harbor, N.Y. [1986]). The viral vector system used to introduce the 

transgene is typically a replication-defective retrovirus carrying the transgene (Jahner et al., Proc. Natl. 
Acad Sci. USA, 82:6927 [1985]). Transfection is easily and efficiently obtained by culturing the 
blastomeres on a monolayer of virus-producing cells (Van der Putten, supra; Stewart, et al, EMBOJ., 
6:383 [1987]). Alternatively, infection can be performed at a later stage. Virus or virus-producing 

35 cells can be injected into the blastocoele (Jahner et al., Nature, 298:623 [1 982]). Most of the founders 
will be mosaic for the transgene since incorporation occurs only in a subset of cells, which form the 
transgenic animal. Further, the founder may contain various retroviral insertions of the transgene at 
different positions in the genome, which generally will segregate in the offspring. In addition, it is also 
possible to introduce transgenes into the germline, albeit with low efficiency, by intrauterine retroviral 

40 infection of the midgestation embryo (Jahner et al, supra [1982]). Additional means of using 
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retroviruses or retroviral vectors to create transgenic animals known to the art involves the micro- 
injection of retroviral particles or mitomycin C-treated cells producing retrovirus into the perivitelline 
space of fertilized eggs or early embryos (PCT International Application WO 90/08832 [1990], and 
Haskell and Bowen, Mol Reprod Dev., 40:386 [1995]). 
5 In other embodiments, the transgene is introduced into embryonic stem (ES) cells and the 

transfected stem cells are utilized to form an embryo. ES cells are obtained by culturing pre- 
implantation embryos in vitro under appropriate conditions (Evans et al., Nature, 292:154 [1981]; 
Bradley et al, Nature, 309:255 [1984]; Gossler et al, Proc. Natl. Acad. Sci. USA, 83:9065 [1986]; and 
Robertson et al., Nature, 322:445 [1986]). Transgenes can be efficiently introduced into the ES cells 

10 by DNA transfection by a variety of methods known to the art including calcium phosphate co- 
precipitation, protoplast or spheroplast fusion, lipofection and DEAE-dextran-mediated transfection. 
Transgenes may also be introduced into ES cells by retrovirus-mediated transduction or by micro- 
injection. Such transfected ES cells can thereafter colonize an embryo following their introduction into 
the blastocoel of a blastocyst-stage embryo and contribute to the germ line of the resulting chimeric 

15 animal (for review, see, Jaenisch, Science, 240: 1468 [1988]). Prior to the introduction of transfected 
ES cells into the blastocoel, the transfected ES cells may be subjected to various selection protocols to 
enrich for ES cells which have integrated the transgene assuming that the transgene provides a means 
for such selection. Alternatively, the polymerase chain reaction may be used to screen for ES cells that 
have integrated the transgene. This technique obviates the need for growth of the transfected ES cells 

20 under appropriate selective conditions prior to transfer into the blastocoel. 

In still other embodiments, homologous recombination utilizes knock-out gene function or 
creates deletion mutants. Methods for homologous recombination are described in U.S. Pat. No. 
5,614,396. 



25 VII. Drug Screening Using RFX4_v3 

The present disclosure provides methods and compositions for using RFX4_v3 as a target for 
screening drugs that can alter expression of congenital hydrocephalus. 

A technique for drug screening provides high throughput screening for compounds having 
suitable binding affinity to RFX4_v3 peptides and is described in detail in WO 84/03564, incorporated 
30 herein by reference. Briefly, large numbers of different small peptide test compounds are synthesized 
on a solid substrate, such as plastic pins or some other surface. The peptide test compounds are then 
reacted with RFX4_v3 peptides and washed. Bound KFX4_v3 peptides are then detected by methods 
well known in the art. 

Another technique uses RFX4_v3 antibodies, generated as discussed above. Such antibodies 
35 capable of specifically binding to RFX4_v3 peptides compete with a test compound for binding to 

RFX4_y3. In this manner, the antibodies can be used to detect the presence of any peptide that shares 
one or more antigenic determinants of the RFX4_y3 peptide. 

The present disclosure contemplates many other means of screening compounds. The 
examples provided above are presented merely to illustrate a range of techniques available. One of 
40 ordinary skill in the art will appreciate that many other screening methods can be used. 
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In particular, the present disclosure contemplates the use of cell lines transfected with 
RFX4_v3 and variants or mutants thereof for screening compounds for activity, and in particular to 
high throughput screening of compounds from combinatorial libraries (e.g, libraries containing greater 

4 

than 10 compounds). The cell lines of the present disclosure can be used in a variety of screening 
5 methods. In some embodiments, the cells can be used in second messenger assays that monitor signal 
transduction following activation of cell-surface receptors. In other embodiments, the cells can be used 
in reporter gene assays that monitor cellular responses at the transcription/translation level. In still 
further embodiments, the cells can be used in cell proliferation assays to monitor the overall growth/no 
growth response of cells to external stimuli. 

10 In second messenger assays, the host cells are preferably transfected as described above with 

vectors encoding RJFX4_v3 or variants or mutants thereof. The host cells are then treated with a 
compound or plurality of compounds (e.g., from a combinatorial library) and assayed for the presence 
or absence of a response. It is contemplated that at least some of the compounds in the combinatorial 
library can serve as agonists, antagonists, activators, or inhibitors of the protein or proteins encoded by 

15 the vectors. It is also contemplated that at least some of the compounds in the combinatorial library 

can serve as agonists, antagonists, activators, or inhibitors of protein acting upstream or downstream of 
the protein encoded by the vector in a signal transduction pathway. 

In some embodiments, the second messenger assays measure fluorescent signals from reporter 
molecules that respond to intracellular changes (e.g., Ca 2+ concentration, membrane potential, pH, IP 3 , 

20 cAMP, arachidonic acid release) due to stimulation of membrane receptors and ion channels (e.g., 
ligand gated ion channels; see Denyer et al, Drug Discov. Today, 3:323 [1998]; and Gonzales et al, 
Drug. Discov. Today, 4:431-39 [1999]). Examples of reporter molecules include, but are not limited 
to, FRET (florescence resonance energy transfer) systems (e.g., Cuo-lipids and oxonols, 
EDAN/DABCYL), calcium sensitive indicators (e.g., Fluo-3, FURA 2, INDO 1, and FLU03/AM, 

25 BAPTA AM), chloride-sensitive indicators (e.g., SPQ, SPA), potassium-sensitive indicators (e.g., 
PBFI), sodium-sensitive indicators (e.g., SBFI), and pH sensitive indicators (e.g., BCECF). 

In general, the host cells are loaded with the indicator prior to exposure to the compound. 
Responses of the host cells to treatment with the compounds can be detected by methods known in the 
art, including, but not limited to, fluorescence microscopy, confocal microscopy (e.g., FCS systems), 

30 flow cytometry, microfluidic devices, FLIPR systems (see, e.g., Schroeder and Neagle, J. Biomol 
Screening, 1:75 [1996]), and plate-reading systems. In some preferred embodiments, the response 
(e.g., increase in fluorescent intensity) caused by a compound of unknown activity is compared to the 
response generated by a known agonist and expressed as a percentage of the maximal response of the 
known agonist. The maximum response caused by a known agonist is defined as a 100% response. 

35 Likewise, the maximal response recorded after addition of an agonist to a sample containing a known 
or test antagonist is detectably lower than the 100% response. 

The ceils are also useful in reporter gene assays. Reporter gene assays involve the use of host 
cells transfected with vectors encoding a nucleic acid comprising transcriptional control elements of a 
target gene (Le., a gene that controls the biological expression and function of a disease target) spliced 
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to a coding sequence for a reporter gene. Therefore, activation of the target gene results in activation of 
the reporter gene product 

Vm. Pharmaceutical Compositions Containing RFX4_v3 Nucleic Acid, Peptides, and 

5 Analogs 

The present disclosure further provides pharmaceutical compositions which may comprise all 
or portions of RFX4_v3 polynucleotide sequences, RFX4_v3 polypeptides, inhibitors, antagonists, 
enhancers or agonists of RFX4jv3 bioactivity, including antibodies, alone or in combination with at 
least one other agent, such as a stabilizing compound, and may be administered in any sterile, 
10 biocompatible pharmaceutical carrier, including, but not limited to, saline, buffered saline, dextrose, 
and water. 

The methods of the present disclosure find use in treating diseases or altering physiological 
states. Peptides can be administered to the patient intravenously in a pharmaceutically acceptable 
carrier such as physiological saline. Standard methods for intracellular delivery of peptides can be used 

15 (e.g., delivery via liposome). Such methods are well known to those of ordinary skill in the art. The 
formulations of this disclosure are useful for parenteral administration, such as intravenous, 
subcutaneous, intramuscular, intraperitoneal, intrathecal, or intraventricular. Therapeutic 
administration of a polypeptide intracellularly can also be accomplished using gene therapy as 
described above, or by intravenous administration of the pharmaceutical composition. 

20 As is well known in the medical arts, dosages for any one patient depends upon many factors, 

including the patient's size, body surface area, age, the particular compound to be administered, sex, 
time and route of administration, general health, and interaction with other drugs being concurrently 
administered. 

Accordingly, in some embodiments of the present disclosure, RFX4_v3 nucleotide and 
25 RFX4_v3 amino acid sequences can be administered to a patient alone, or in combination with other 
nucleotide sequences, drugs or hormones or in pharmaceutical compositions where it is mixed with 
excipient(s) or other pharmaceutically acceptable carriers. In one embodiment of the present 
disclosure, the pharmaceutically acceptable carrier is pharmaceutically inert. In another embodiment of 
the present disclosure, RFX4__y3 polynucleotide sequences or RFX4_v3 amino acid sequences may be 
30 administered alone to individuals subject to or suffering from a disease. 

Depending on the condition being treated, these pharmaceutical compositions may be 
formulated and administered systemically or locally. Techniques for formulation and administration 
may be found in the latest edition of "Remington's Pharmaceutical Sciences" (Mack Publishing Co, 
Easton, PA). Suitable routes may, for example, include oral or transmucosal administration; as well as 
35 parenteral delivery, including intramuscular, subcutaneous, intramedullary, intrathecal, intraventricular, 
intravenous, intraperitoneal, or intranasal administration. 

For injection, the pharmaceutical compositions of the disclosure may be formulated in 
aqueous solutions, preferably in physiologically compatible buffers such as Hanks* solution, Ringer's 
solution, or physiologically buffered saline. For tissue or cellular administration, penetrants 
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appropriate to the particular barrier to be permeated are used in the formulation. Such penetrants are 
generally known in the art. 

In other embodiments, the pharmaceutical compositions of the present disclosure can be 
formulated using pharmaceutically acceptable carriers well known in the art in dosages suitable for oral 
administration. Such carriers enable the pharmaceutical compositions to be formulated as tablets, pills, 
capsules, liquids, gels, syrups, slurries, suspensions and the like, for oral or nasal ingestion by a patient 
to be treated. 

Pharmaceutical compositions suitable for use in the present disclosure include compositions 
wherein the active ingredients are contained in an effective amount to achieve the intended purpose. 
For example, an effective amount of RFX4_v3 may be that amount that protects against congenital 
hydrocephalus. Determination of effective amounts is well within the capability of those skilled in the 
art, especially in light of the disclosure provided herein. 

In addition to the active ingredients, these pharmaceutical compositions may contain suitable 
pharmaceutically acceptable carriers comprising excipients and auxiliaries, which facilitate processing 
of the active compounds into preparations that can be used pharmaceutically. The preparations 
formulated for oral administration may be in the form of tablets, dragees, capsules, or solutions. 

The pharmaceutical compositions of the present disclosure may be manufactured in a manner 
that is itself known (e.g., by means of conventional mixing, dissolving, granulating, dragee-making, 
levigating, emulsifying, encapsulating, entrapping or lyophilizing processes). 

Pharmaceutical formulations for parenteral administration include aqueous solutions of the 
active compounds in water-soluble form. Additionally, suspensions of the active compounds may be 
prepared as appropriate oily injection suspensions. Suitable lipophilic solvents or vehicles include fatty 
oils such as sesame oil, or synthetic fatty acid esters, such as ethyl oleate or triglycerides, or liposomes. 
Aqueous injection suspensions may contain substances that increase the viscosity of the suspension, 
such as sodium carboxymethyl cellulose, sorbitol, or dextran. Optionally, the suspension may also 
contain suitable stabilizers or agents that increase the solubility of the compounds to allow for the 
preparation of highly concentrated solutions. 

Pharmaceutical preparations for oral use can be obtained by combining the active compounds 
with solid excipient, optionally grinding a resulting mixture, and processing the mixture of granules, 
after adding suitable auxiliaries, if desired, to obtain tablets or dragee cores. Suitable excipients are 
carbohydrate or protein fillers such as sugars, including lactose, sucrose, mannitol, or sorbitol; starch 
from corn, wheat, rice, potato, etc; cellulose such as methyl cellulose, hydroxypropylmethyl-cellulose, 
or sodium carboxymethylcellulose; and gums including arabic and tragacanth; and proteins such as 
gelatin and collagen. If desired, disintegrating or solubilizing agents may be added, such as the cross- 
linked polyvinyl pyrrolidone, agar, alginic acid or a salt thereof such as sodium alginate. 

Dragee cores are provided with suitable coatings such as concentrated sugar solutions, which 
may also contain gum arabic, talc, polyvinylpyrrolidone, carbopol gel, polyethylene glycol, and/or 
titanium dioxide, lacquer solutions, and suitable organic solvents or solvent mixtures. Dyestuffs or 
pigments may be added to the tablets or dragee coatings for product identification or to characterize the 
quantity of active compound, (i.e., dosage). 
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Pharmaceutical preparations that can be used orally include push-fit capsules made of gelatin, 
as well as soft, sealed capsules made of gelatin and a coating such as glycerol or sorbitol. The push-fit 
capsules can contain the active ingredients mixed with a filler or binders such as lactose or starches, 
lubricants such as talc or magnesium stearate, and, optionally, stabilizers. In soft capsules, the active 

5 compounds may be dissolved or suspended in suitable liquids, such as fatty oils, liquid paraffin, or 
liquid polyethylene glycol with or without stabilizers. 

Compositions comprising a compound of the disclosure formulated in a pharmaceutical 
acceptable carrier may be prepared, placed in an appropriate container, and labeled for treatment of an 
indicated condition. For polynucleotide or amino acid sequences of RFX4_v3, conditions indicated on 

10 the label may include treatment of condition related to congenital hydrocephalus. 

The pharmaceutical composition may be provided as a salt and can be formed with many 
acids, including but not limited to hydrochloric, sulfuric, acetic, lactic, tartaric, malic, succinic, etc. 
Salts tend to be more soluble in aqueous or other protonic solvents that are the corresponding free base 
forms. In other cases, the preferred preparation may be a lyophilized powder in 1 mM-50 mM 

15 histidine, 0.1%-2% sucrose, 2%-7% mannitol at a pH range of 4.5 to 5.5 that is combined with buffer 
prior to use. 

For any compound used in the method of the disclosure, the therapeutically effective dose can 
be estimated initially from cell culture assays. Then, preferably, dosage can be formulated in animal 
models (particularly murine models) to achieve a desirable circulating concentration range that adjusts 

20 RFX4_v3 levels. 

A therapeutically effective dose refers to that amount of RFX4_v3 that ameliorates symptoms 
of the disease state. Toxicity and therapeutic efficacy of such compounds can be determined by 
standard pharmaceutical procedures in cell cultures, experimental animals or transgenic animals, e.g., 
for determining the LD 5Q (the dose lethal to 50% of the population) and the ED 50 (the dose 

25 therapeutically effective in 50% of the population). The dose ratio between toxic and therapeutic 
effects is the therapeutic index, and it can be expressed as the ratio LD 5Q /ED 5() . Compounds which 
exhibit large therapeutic indices are preferred. The data obtained from these cell culture assays and 
additional animal studies can be used in formulating a range of dosage for human use. The dosage of 
such compounds lies preferably within a range of circulating concentrations that include the ED 5Q with 

30 little or no toxicity. The dosage varies within this range depending upon the dosage form employed, 
sensitivity of the patient, and the route of administration. 

The exact dosage is chosen by the individual physician in view of the patient to be treated. 
Dosage and administration are adjusted to provide sufficient levels of the active moiety or to maintain 
the desired effect. Additional factors, which may be taken into account, include the severity of the 

35 disease state; age, weight, and gender of the patient; diet, time and frequency of administration, drug 
combination(s), reaction sensitivities, and tolerance/response to therapy. Long acting pharmaceutical 
compositions might be administered every 3 to 4 days, every week, or once every two weeks depending 
on half-life and clearance rate of the particular formulation. 

Normal dosage amounts may vary from 0.1 to 100,000 micrograms, up to a total dose of about 

40 1 g, depending upon the route of administration. Guidance as to particular dosages and methods of 
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delivery is provided in the literature (see, U.S. Pat. Nos. 4,657,760; 5,206,344; or 5,225,212). Those 
skilled in the art will employ different formulations for RFX4_v3 than for the inhibitors of RFX4_v3. 

The subject matter of the present disclosure is further illustrated by the following non-limiting 
Examples. 

5 

EXAMPLES 

In the experimental disclosure which follows, the following abbreviations apply: eq 
(equivalents); M (Molar); uM (micromolar); N (Normal); mol (moles); mmol (millimoles); umol 
(micromoles); nmol (nanomoles); g (grams); mg (milligrams); ug (micrograms); ng (nanograms); 1 or L 
10 (liters); ml (milliliters); ul (microliters); cm (centimeters); mm (millimeters); p (micrometers); nm 
(nanometers); °C (degrees Centigrade); U (units), mU (milliunits); min. (minutes); sec. (seconds); % 
(percent); kb (kilobase); bp (base pair); PGR (polymerase chain reaction); volume for volume (v/v). 

Example 1 

1 5 Development of RFX4_v3 Transgenic Mice 

In this example, the development of the RFX4jv3 transgenic mice is described. RFX4_v3 
transgenic mice were generated in which transgenic mice were created for the cardiac-specific 
expression of human CYP2J2, a cytochrome P450 arachidonic acid epoxygenase, using a mouse 
cardiac myosin promoter and a human growth hormone 3' -untranslated region (3'-UTR). The vector 

20 CYP2J2-pBS-aMHC-hGH, which contains the coding region of the CYP2J2 cDNA, aMHC promoter 
to drive cardiomyocyte-specific expression of the transgene and human growth hormone intron/polyA 
sequences to enhance transgene mRNA stability, was constructed. The linearized transgene was 
microinjected into pronuclei of single cell mouse embryos that were implanted into pseudopregnant 
female mice. Founder pups were identified by a combination of PCR and Southern blotting of tail 

25 genomic DNAs. Offspring from one of the founder lines (line Tr5) had congenital hydrocephalus. 
Details of the transgene construction and methods used in creating the transgenic mice are described 
below and have been described elsewhere (Yang et al, submitted for publication, 2003) herein 
incorporated by reference. 

30 Example 2 

Identification of the transgene insertion site 
This example describes methods used to identify the insertion site of the transgene into the 
mouse genome. A Universal Genome Walker Kit (Clontech, Palo Alto, CA) was used to identify the 
mouse genomic sequences adjacent to the transgene insertion site. Briefly, genomic DNA from 

35 transgenic mice was digested with Dra\ EcoKV, Pvull or Shd, and ligated to adaptors supplied by the 
manufacturer. PCR amplification of 3' adjacent sequences utilized the Advantage Genomic PCR Kit 
(Clontech), the universal adaptor primers API and AP2, and the following nested gene-specific 
primers: 5'-ACAACTCTGCGATGGGCTCTGCTTT-3 ' (SEQ ID NO: 25) and 5'- 
CTG ACCAATTTGACGGCGCTGCAC A-3 * (SEQ ID NO: 26). PCR products were cloned into the 

40 pCRII vector utilizing the TA Cloning Kit (InVitrogen/Life Technologies, Carlsbad, CA) and 
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sequenced using the Big Dye Terminator Cycle Sequencing Ready Reaction Kit (Applied Biosystems, 
Foster City, CA). PCR amplification of 5'adjacent sequences was similarly performed using the 
following nested gene-specific primers: 5 '-GGCCATTGTC ACCACTCGTAA-3 ' (SEQ ID NO: 27) 
and 5'-CACAAGTAAAGGCTAACGCGC-3 ' (SEQ ID NO: 28). 

Example 3 
Plasmids Utilized 

In this example, the plasmids used in developing the RFX4_v3 transgenic mice and in 
identifying homologous RFX4__v3 transcripts in other non-mouse species are described. The plasmid 
insert containing the 7.5 kb transgene insert has been described elsewhere (Y ang et aL, submitted for 
publication, 2003); it consists of a 1 .8 kb protein coding region of the human cytochrome P450 
epoxygenase, driven by 5.5 kb of the mouse cardiac myosin promoter and contained 1.8 kb of the 
human growth hormone 3 '-untranslated region. Plasmids containing the indicated human, mouse and 
zebrafish ESTs were obtained from the IMAGE consortium. A plasmid containing the putative protein 
coding region of the mouse RFX4_v3 was made by first using Platinum Pfx polymerase 
(InVitrogen/LifeTechnologies, Carlsbad, CA) to reverse transcribe total adult mouse brain RNA as the 
template. The resulting cDNA was then subjected to two rounds of nested PCR using primers based on 
the 5' and 3' sequences of apparent mouse brain RFX4 sequences from GenBank. The first pair of 
primers corresponded to bp 255-278 of accession number BB873367 and to bp 100-124 of accession 
number BB379807, and the second set of primers corresponded to 291-309 of accession number 
BB873367 and 99-78 of accession number BB379807. The resulting PCR product was sequenced 
using the ABI Prism dRhodamine Terminator Cycle Sequencing Ready Reaction Kit (Applied 
Biosystems, Foster City, CA). 

Probes corresponding to the unique 5'-ends of mouse RFX4_1 and RFX4_3 were constructed 
by PCR amplification of reverse-transcribed mouse testis RNA or brain RNA, respectively. Reverse 
transcription was carried out using 1 |ig of total RNA, an anchored oligo (dT) primer (T 18 VN) and 
Superscript II RNase ff Reverse Transcriptase (Invitrogen Life Technologies, Carlsbad, CA). PCR 
was performed using primers based on the sequence for human RFX4__vl (accession number 
NM_032491) or the sequence for mouse RFX4_v3 contained in the mouse brain EST accession 
number BB595996. The forward primer for RFX4_vl was 5'-AGGTGGGAAGGCAGTTATGACAG- 
V (SEQ ID NO: 16; corresponding to bases 1-23 of NM_032491) and the reverse primer was 5'- 
TCCGTGATATTTCTGCTTAGTGGGC-3 ' (SEQ ID NO: 17; bases 201-177). A second round of 
PCR was carried out with forward primer 5'- GGC AGTTATG AC AGTTG AG AAGTAGTAG-3 ' (SEQ 
ID NO: 18; bases 10-37) and reverse primer 5 ' -CTGCTTAGTGGGC ATCTCG AATCTATC-3 ' (SEQ 
ID NO: 19; bases 189-163). The forward primer for mouse RFX4 v3 was 5'- 
TTTTGACGGGTTTGGCTTTG-3 ' (SEQ ID NO: 20; bases 1 18-137 of BB595996) and the reverse 
primer was 5'-TTCCTCCAGTAACCCACAATGC-3' (SEQ ID NO: 21; bases 447-426). A probe 
corresponding to the unique 5 '-end of RFX4_v2 was isolated by PCR amplification from mouse L cell 
genomic DNA using primers based on the sequence for human RFX4_y2 (accession number 
NM_002920). PCR was carried out using forward primer 5'- TGG AG AGGCC ACAGCTGCTGG-3 ' 
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(SEQ ID NO: 22; bases 1-21 of NM 002920) and reverse primer 5 ' -TCG AGGCCTGGTCCTGTCGC- 
3' (SEQ ID NO: 23; bases 159-140). A second round of PCR was performed with 5*- 
CAC AGCTGCTGGCTTCCTGG-3 ' (SEQ ID NO: 24; bases 10-29) and the same reverse primer as in 
the first round of PCR. All three unique 5'-ends of RFX4_yl, RFX4_v2 and RFX4_v3 were 

5 sequenced using the ABI Prism dRhodamine Terminator Cycle Sequencing Ready Reaction Kit 
(Applied Biosystems, Foster City, CA). 

A cDNA corresponding to human RFX4_v3 was cloned by screening a human fetal brain 
cDNA library (Stratagene) with the insert from the human IMAGE clone # 46678 (GenBank accession 
number HI 0145). The resulting cDNA clone was sequenced by dideoxynucleotide techniques (see 

10 above). A plasmid (GenBank accession number AI657628) containing a zebrafish EST sequence that 
predicted a protein closely related to the amino terminus of mouse and human RFX4_v3 was also 
obtained from the IMAGE Consortium and sequenced by dideoxynucleotide techniques. 

Example 4 

15 Histology and Antibody Staining of Brain Tissue 

In this example, the histology and antibody staining of brain tissue from the RFX4__v3 
transgenic mice are described. For histology, embryos and tissues from newborn or adult mice were 
fixed in Bouin's fixative for 12-48 hours, depending on tissue size, and then cleared in 70% (v/v) 
ethanol. Tissues were then embedded in paraffin, sectioned and stained with hematoxylin and eosin by 

20 standard methods. For immunohistochemistry, paraffin sections were stained with an antibody 

(Rodriguez, et aL, Cell Tissue Res., 237:427-41 [1984]) to Reissner's fibers (RF) within the SCO, as 
described previously for a different antibody (Blackshear et al, Dev. Brain Res., 96:62-75 [1996]). 
The anti-RF antibody was a gift from Dr. E. M. Rodriguez, Institute de Histologia y Patologia, 
Facultad de Medicina, Universidad Austral de Chile, Valdivia, Chile. 

25 

Example 5 
In situ hybridization histochemistry 
This example describes methods for in situ hybridization using brain tissue from the RFX4_v3 
transgenic mice. Embryos were dissected in PBS and fixed in 4% (w/v) paraformaldehyde/PBS at 4° 
30 C. Specimens for whole-mount in situ hybridization were gradually dehydrated in methanol/PBS and 
stored in 100% methanol at -80° C. Specimens for in situ hybridization on frozen sections were 
cryoprotected in 30% sucrose and embedded in TissueTek (Sakura), and 20um thick sections were 
obtained using a cryostat. Whole mount and section in situ hybridizations were performed according to 
the methods of Wilkinson and Tsuchida et aL, respectively (Wilkinson et aL 9 (1992). In In situ 
35 hybridization: a practical approach, (ed. D. G. Wilkinson), pp. 75-83. Oxford: IRL Press; Tsuchida et 
aL, (1994). Cell 79, 957-70). The probes used and their sources were as follows: RFX4 (this paper); 
Otx2 (Antonio Simeone); Bfl (Eseng Lai); Fgf8 (Gail Martin); Msx2 (Betham Thomas); Wnt3a and 
Wnt7b (Andrew MacMahon); Lhx2 (Heiner Westphal); Pax6 and Six3 (Peter Gruss); Emxl, Dlx2 and 
Nkx2.1 (J.L.R.R.'s laboratory). 
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Example 6 
Evaluation of transgenic mice. 
This example describes the results of evaluation of the transgenic mice. A large percentage of 
5 mice in one (Tr5) of six transgenic (TG) lines exhibited head swelling followed by rapid neurological 
deterioration and death in young adulthood. The external swelling was apparent by the increased 
convexity of the head, and the lateral displacement of the ears (Fig. 7A). Histological examination of 
the brains of symptomatic adult mice revealed severe hydrocephalus in the anterior brain, with extreme 
dilatation of the lateral ventricles but no apparent effect on the fourth ventricle (Fig. 7B). Although 

10 many of the mice developed the severe form of the syndrome within the first two months of life, 

sufficient mice survived to propagate the line. Nonetheless, examination of the brains of successful 
adult breeders showed severe hydrocephalus, with extreme lateral ventricle dilatation and the formation 
of false ventricles near the external capsule, as well as midline structural disruption by the extreme 
hydrocephalus. These findings were compatible with an obstructive hydrocephalus, and were 

15 consistent with the form of hydrocephalus seen with stenosis of the aqueduct of Sylvius, or aqueductal 
stenosis. It should be noted that CYP2J2 transgene expression did not occur in brains from the TG 
mice, as evaluated with two different CYP2 J2-specific antibodies on western blots. 

Examination of TG mice from the Tr5 line at the time of birth (P0.5) showed that severe 
hydrocephalus was present in all mice harboring the transgene, indicating that the hydrocephalus was 

20 congenital. In contrast, none of the wild-type (WT) littermates had hydrocephalus. The hydrocephalus 
was most apparent in the olfactory and lateral ventricles, with apparent sparing of the fourth ventricle 
(Fig. 8). These data support the possibility of a congenital obstruction in the aqueduct of Sylvius. 

Examination of the aqueduct in serial coronal sections from a TG mouse and its WT littermate 
at P0.5 showed the apparent absence of the subcommissural organ (SCO) in the transgenic mice (Fig. 

25 9A). This organ produces Reissner's fibers, and both the organ and the fibers have been shown to be 
important for the patency of the aqueduct, in that destruction of the SCO leads to obstructive 
hydrocephalus (Perez-Figares et al., 2001). Antibodies specific to Reissner's fibers (Rodriguez et al., 
1984; Rodriguez et al., 2001; Rodriguez et al., 1998) strongly and specifically labeled the SCO from 
the WT mice (Fig. 9B), but this label was generally not detected in the same anatomical region in the 

30 TG mice. Rarely, a small amount of staining could be found in sections from the TG mice at the 

anatomical location that should have contained the SCO (Fig. 9B); however, this staining was always 
markedly less than that seen in the WT mice. Although the SCO appeared to be largely absent in the 
TG mice, other midline structures, such as the pineal body and posterior commissure, were present and 
appeared to be anatomically normal. 

35 We next examined the birth statistics from this line of transgenic animals for Mendelian 

frequencies. For crosses in which TG mice were bred to WT mice, there were 6.7 +/- 0.4 (SE) live 
births per litter based on data from 47 litters. Of 3 15 pups born, 46% were TG and 54% were WT. For 
comparison, TG mice originating from another founder line crossed with WT mice resulted in 7.0 +/- 
0.4 (SE) live births per litter based on data from 45 litters, with 52% of 3 17 pups genotyped as TG. 

40 These data suggest minimal if any prenatal loss of TG pups, despite the presence of congenital 
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hydrocephalus. In the TG mice, severe hydrocephalus requiring euthanasia developed in about 75% of 
the mice at an average age of 47 +/- 3 days (range 24-84 days). There was no significant difference in 
frequency of hydrocephalus between males and females. The hydrocephalus phenotype has persisted 
in TG mice through nine generations. 
5 All other non-brain tissues of the TG mice appeared to be histologically normal. 

Example 7 

Identification of genomic sequences flanking the transgene 
This example describes the identification of genomic sequences flanking the transgene. 

10 Because it appeared that the transgene had interrupted the coding or regulatory regions of an important 
gene, the mouse genomic sequences flanking the transgene were identified. Using PCR based on 5' 
and 3' transgene sequences, there were at least two tandem copies of the 7.5 kb transgene in genomic 
DNA from the TG mice, indicating that the potential genomic interruption was at least 15 kb in size; 
Southern analysis using a transgene-specific probe indicated that there was only one copy of this 

15 concatenated transgene in the mouse genome. Using the "Genome Walker" technique with genomic 
DNA from the TG mice and transgene-specific oligonucleotide primers, both the 5* and 3' flanking 
genomic sequences into which the transgene had been inserted were identified. When these sequences 
were compared to the mouse genomic sequences in the GenBank trace archives, the transgene insertion 
site was identified as between bp 528 and 529 in gnl|ti| 13973384 and between bp 171 and 172 in 

20 gnl|ti| 84074979. The 5' and 3' flanking sequences identified by the Genome Walker technique were 
contiguous in the normal mouse genomic sequences in the trace archives, indicating that the transgene 
insertion was not accompanied by a genomic deletion, as has been seen in some recent examples of 
accidental transgenic insertional mutagenesis(Durkin et al, (2001) Genomics 73, 20-7; Overbeek et aL, 
(2001) Genesis 30, 26-35). Southern analysis using a 3'-insertion site-specific probe demonstrated the 

25 presence of single novel bands in restriction enzyme-digested DNA from the transgenic mice, 
confirming a single transgene insertion site at this location (Fig. 10A). 

The flanking sequences identified by the Genome Walker approach were merged with the 
available mouse genomic sequence from the trace archives to form a small contig; this did not 
recognize any cDNAs or expressed sequence tags (ESTs) in the database at that time. Therefore, the 

30 assembled mouse contig was used to search the human genome sequences then available in GenBank, 
using blastn. The mouse sequence was highly related (4e-28) to a human genomic sequence 
corresponding to a portion of human chromosome 12 (accession number NT_009720.8). When this 
small region of human genomic sequence was analyzed for expressed sequences, it did not match any 
deposited in GenBank. However, when a much larger amount of human genomic DNA from this locus 

35 was used to search for expressed sequences, genomic DNA within 200 kb of the human sequence 
corresponding to the transgene insertion site was found to contain all of the exons of two distinct 
cDNAs in GenBank that correspond to two forms of the human winged helix protein RFX4: One is 
represented by GenBank accession number NM_032491, referred to as RFX4 variant transcript 1, or 
RFX4_vl, and corresponds to protein accession number NP_1 15880; the other is represented by 

40 GenBank accession number NM_002920 and is referred to as RFX4 variant transcript 2, or RFX4__v2, 
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corresponding to protein accession number NP_0029 11. See the nomenclature recommendations of the 
Human Genome Nomenclature committee for the conventions described here. 

According to the mouse - human alignments, the site of the transgene insertion within the 
mouse genome was at a corresponding region within the human chromosomal 12 sequence that would 

5 be within the intron between exons 13 and 14 of RFX4_vl (see below); it would not have affected the 
exon arrangements of RFX4_v2. 

Using PCR primers based on the inserted transgene and the neighboring endogenous mouse 
genomic DNA, the WT (+/+) and transgene-interrupted alleles (+/- for one allele disrupted, -/- for both 
alleles disrupted) were found to be readily distinguished in a litter of newborn mice from interbred TG 

10 mice (Fig. 1 OB). 

To examine the possibility that the transgene insertion had in some way interfered with the 
expression of a full-length mouse RFX4 transcript in brain, northern blots from brains of neonatal +/+, 
+/- and -/- mice were probed with a mouse brain EST cDNA clone (IMAGE # 763537, GenBank 
accession numbers AA285775 and AI462920) that was highly related (e-124 over 284 aligned bases) to 

15 the 3'- end of the human cDNA for RFX4_vl. Brains from the +/+ mice expressed a prominent band 
of - 4 kb that are referred to as RFX4 variant transcript 3, or RFX4_v3 (Fig. 10C; see below). Brains 
from the +/- mice expressed approximately 50% of the normal complement of this transcript, whereas 
the brains from the -/- mice expressed no detectable transcript of this size (Fig. 10C). Probing the same 
blot with an actin cDNA demonstrated that gel loading was similar in the three lanes (Fig. 10C). 

20 Similar results were obtained in three separate experiments. There was no evidence for the expression 
of a truncated mRNA in the brain samples from either the +/- or -/- mice. These studies confirmed that 
an mRNA species of ~ 4 kb that was recognized by a probe derived from putative mouse 3' RFX4_vl 
sequences was decreased in amount in brains of the +/- mice, and absent from the brains of the -/- mice. 
These data suggested that the insertion of the transgene interfered with the expression of the putative 

25 brain RFX4_y3 transcript 

Using the same probe to examine the tissue-specific and developmental expression of this 
RFX4 transcript, high-level expression of a slightly smaller transcript was found in normal adult testis, 
and lower level expression of a considerably smaller transcript was found in liver (Fig. 10D). The 
largest species, corresponding to the apparent brain-specific transcript labeled RFX4_v3 in Fig. 10D, 

30 was the only one detected in whole embryos early in development (Fig. 10E). These data suggested 
that an apparently brain-specific isoform of RFX4 in the adult was highly expressed in the whole 
embryo during early development, initially appearing between embryonic day (E) 7.5 and 9.5 (Fig. 



This example describes the results obtained from identification of the RFX4_v3 transcripts 
and proteins. Using primers based on mouse brain EST sequences that contained internal sequences 
highly related to the human RFX4 cDNAs in GenBank, PCR and an adult mouse brain cDNA library 
40 were used to generate a - 3 kb plasmid insert that was then sequenced. This cDNA has been 



10E). 



35 



Example 8 

Identification of the RFX4_v3 transcripts and proteins 
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designated RFX4 transcript variant 3 (RFX4_v3), and the mouse sequence was deposited in GenBank 
(accession number AY 1020 10). When this sequence was merged with all available 5' and 3' mouse 
ESTs from GenBank, the resulting transcript was 3952 b, closely approximating the transcript size seen 
on northern blots. In addition, a cDNA sequence was deposited in GenBank on Dec. 5, 2002 
5 (GenBank accession number AK034 131.1) that was 3535 b in length; over this length, it was more than 
99% identical to the putative RFX4_y3 full-length transcript described above, and include the entire 
putative protein coding region. This cDNA was isolated from an adult male mouse diencephalon 
library and confirms the existence in brain of at least the protein coding region of our predicted full 
length RFX4_v3 transcript. 
10 Similar probes as used to generate the northern blots shown in Fig. 10 were then used to 

screen a human brain cDNA library, and positive inserts were sequenced. This cDNA sequence has 
been deposited in GenBank as human RFX4_v3 (accession number AY1 02009; SEQ ID NO: 7). The 
predicted unique mouse amino terminal protein sequence (see below) also was used to search the non- 
human, non-mouse ESTs in GenBank, and a zebrafish EST clone (accession number AI657628) with a 
15 nearly identical predicted amino-terminal protein sequence was obtained from the IMAGE consortium 
and sequenced. This sequence is referred to as zebrafish RFX4_v3, and the complete insert cDNA 
sequence has been assigned accession number AY 10201 1 (SEQ ID NO: 9). 

An alignment of these three predicted amino acid sequences is shown in Fig. 6. There was 
96% amino acid identity between the predicted mouse and human proteins, and 83% amino acid 
20 identity between the predicted human and zebrafish proteins. The alignment also illustrates several of 
the characteristic domains of the RFX proteins that are highly conserved in all three orthologues, i.e., 
the DNA binding domain, boxes B and C, and the dimerization domain (Morotomi-Yano et al. t (2002) 
JBiolChem 211, 836-42). 

Human chromosome 12 sequence was then re-searched with the mouse and human cDNA 
25 sequences, and the exons that contributed to the novel human RFX4_v3 isoform described here, in 
addition to those described above that corresponded to the two previously described human cDNAs 
were identified. The results of this analysis are shown in Fig. 2. The two previously described human 
RFX4 cDNAs are composed of both unique and shared exons. In the case of the cDNA represented by 
accession number NM_002920 (RFX4_v2), the first five exons (shown in Fig. 2) correspond to five 
30 exons within the 90 kb interval between bp 390,000 - 480,000 of the genomic clone NT_009720.8 (in 
reverse complement orientation). The next nine exons and part of a tenth are common to the other 
version of RFX4 in GenBank (RFX4_vl), represented by the cDNA NM_032491. These 10 exons are 
derived from coding sequences in the genomic clone NTJ)09720.8 between 340,000 and 400,000. As 
shown in Fig. 2, the final (15 th ) exon of RFX4_v2 contains a polyadenylation sequence that allows for 
35 final processing of the mature mRNA. 

The other human cDNA, RFX4_vl (NMJ)32491), contains a 5' exon that is encoded by 
genomic sequences inNT_009720.8 that are located between the exons 5 and 6 of RFX4_v2 (Fig. 2) 
and is unique to that cDNA. RFX4_vl then shares 10 exons with RFX4_v2, followed by three unique 
3' exons. These last three unique exons are found within the interval between bp 3 15,000 - 325,000 of 
40 the genomic clone NT_009720.8. Remarkably, exon 12 from RFX4_vl is apparently spliced into exon 
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15 of RFX4_y2, resulting in the novel 3' end of RFX4_vl and a different poly A tail. The displaced 
sequence in RFX4_v2 is represented as exon 15B in Fig. 2. 

The exon pattern that corresponds to the mouse and human RFX4_v3 mRNAs and proteins is 
illustrated at the bottom of Fig. 2. A novel exon derived from a sequence between 480,000 and 
500,000 of NT_009720.8 was used to form the first 14 amino acids at the amino terminus (Fig. 2). The 
next four exons, 2-5, are composed of the four exons of the same number from RFX4_v2; exon 1 of 
RFX4_v2 is not present in the RFX4_v3 cDNA. The middle of the RFX4_v3 cDNA and protein are 
formed by the 10 exons held in common between RFX4 vl and RFX4_v2. The carboxyl terminus of 
RFX4_v3 is composed of the three carboxyl-terminal exons present only in RFX4_vl. Thus, the novel 
RFX4_v3 isoform described here is composed of a unique arrangement of 18 exons derived from 
almost 200 kb of human genomic sequence. One exon (the first) is unique to this sequence; exons 2-5 
are shared with RFX4_v2; exons 6-15 are shared with both RFX4__vl and RFX4_v2; and exons 16-18 
are shared with only RFX4_vl . 

The site of transgene interruption is also illustrated in Fig. 2. The > 15 kb transgene was 
inserted into the intron between exons 17 and 18 of RFX4__v3, within the carboxyl-terminal end of the 
protein coding region, and presumably interferes with splicing of the final exon and generation of an 
intact mature mRNA. No evidence has been found to date that a stable truncated mRNA species results 
from this transgene insertion. 

Specific cDNA probes corresponding to unique 5' sequences were designed and cloned for 
each of the three RFX4 transcript variants RFX4_vl, v2 and v3. These were then used to probe 
northern blots of RNA from brains of El 8.5 mice as well as from adult testes, liver and brain. A probe 
that spanned regions common to the RFX4_yl, v2 and v3 transcripts hybridized to two major mRNA 
species in testes, a single transcript of intermediate size in liver, and a single transcript of the largest 
size (~4 kb) in RNA from adult brain. This probe only hybridized to the 4 kb RNA species in brains 
from E18.5 mice; the amount of hybridization of this probe decreased from the +/+ to the +/- mouse 
brain, and was undetectable in brain from the -/- sample. When similar blots were hybridized with a 
probe specific for vl and v3, only the larger of the two testes transcripts (vl) was detected, while the 
largest transcript (v3) was again identified in the adult brain sample and in the brain from El 8.5 +/+ 
fetal mice. Again, the expression of the transcript hybridizing to this probe decreased with decreasing 
allelic dosage. 

The identities of the various transcripts were determined by the use of transcript-specific 
probes, which confirmed the assignments of the vl and v2 transcripts in testis, and the complete 
absence of hybridization of either probe to transcripts from normal adult brain (Fig. 1 1), or brain from 
El 8.5 mice of the +/+, +/- and -/-. There was no evidence of compensatory expression of either the vl 
or v2 transcripts in the El 8.5 brains of the -/- mice. The v3-specific probe was used to confirm the 
identity of the single, large transcript in brain as RFX4_v3, and also confirmed its allelic dose-related 
expression in E18.5 mouse brain (Fig. 1 1). These data indicate that the v3 transcript variant is the only 
form significantly expressed in the adult and fetal brain, and also confirmed it as the transcript variant 
expressed in the whole embryo and brain in earlier development (see Fig. 10E). 
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The apparently liver-specific transcript may represent a tt RFX4_v4", or it could represent 
cross hybridization of the longer probes to another member of the RFX transcript family that is highly 
expressed in liver. 

Example 9 

Analysis of RFX_v3 transcript expression during development 

This example describes the pattern of RFX4_v3 transcript expression in mouse embryos, as 
analyzed using RNA in situ hybridization. A probe was used that contained sequences specific to both 
RFX4_vl and v3. RFX4_v3 RNA was found primarily in the brain where its regional expression was 
highly dynamic during development. At E8.5, RFX4jv3 expression was detected in most of the neural 
plate, but its expression was excluded from the presumptive forebrain region (Fig. 12A,B). By E9.5, 
most of its expression encompassed two large regions: the caudal diencephalon/mesencephalon and 
the spinal cord (Fig. 12C). The rostral limit of the diencephalic expression approximated the zona 
limitans; the only expression extending anterior of this boundary was in the caudodorsal telencephalon 
(Fig. 12C). 

At E10.5, RFX4_v3 expression extended throughout the neural tube (Fig. 12D-F). In the 
telencephalon, its expression was limited to the cerebral cortex. Expression in the telencephalic dorsal 
midline was not detectable (Fig. 12F-H, arrowheads), and remained negative from that time onward 
during development. Thus, expression in the telencephalic roof plate was temporally restricted to the 
period just after neural tube closure (~ E9.5). 

Transient RFX4_v3 expression appeared in the central retina. The lateral optic stalks also 
exhibited RFX4_v3 expression (Fig. 12H), while the medial optic stalks showed expression at later 
stages (Fig. 12K). 

From E12.5 to birth, the neuroepithelium and later the ependyma of most of the neural tube 
expressed variable levels of RFX4_v3 transcripts. For example, in the cerebral cortex, RFX4_v3 was 
expressed in a dorsal to ventral gradient (Fig. 12K). The majority of roof plate derivatives of the 
central nervous system, including most of the circumventricular organs, had turned off RXF4_v3 
expression by this stage (for example, the epiphysis, and the choroid plexus of the lateral and fourth 
ventricles in Fig. 12L,M). A striking exception to this pattern was the expression of RFX4_v3 in the 
region of the developing SCO found in the caudal diencephaion, where there was strong expression 
from E14.5 to birth (Fig. 13C,E-G). 

The only RFX4_v3 positive structures noted outside of the central nervous system were the 
trigeminal and facial/vestibuiar ganglia (Fig. 121) and the anterior pituitary (Fig. 13B). 

Example 10 
Phenotype of RFX4_v3-deficient mice 

This example describes the phenotype of RFX4_v3 -deficient mice. Surviving TG mice, 
which are referred to as RFX4_v3 +/- mice, were interbred to generate -/- mice. Ten pregnant +/- mice 
were allowed to carry to term and deliver; the average litter size of these pregnancies was 5.3 +/- 0.6, 
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which was significantly smaller than litters from a control line 7.0 +/- 0.4 (p = 0.022). Of 53 pups 
born, 19 (36%) were WT, 28 (53%) +/-, and 6 (1 1%) -/-, suggesting substantial intrauterine or perinatal 
loss of the -/- pups. All of the -/- pups born died within an hour of birth. Seven additional litters were 
obtained between E8 and E18. The average size of those litters was 8.7 +/- 0.5, which was not 
5 significantly different from control litters. Of 61 pups obtained, there were 10 (16%) +/+, 36 (59%) +/- 
and 15 (25%) -/-, indicating no excess intrauterine mortality. 

The brains of the -/- mice at the time of birth and at E16.5 were grossly dysmorphic. Thus, the 
-/- mice were examined at an earlier developmental stage, E12.5. The phenotype at this age was 
striking (Fig. 14). Externally, there were clear abnormalities of head appearance, although the position 
10 of the eyes, vibrissae and other fecial structures appeared relatively normal (Fig. 14A). Coronal 

sections suggested that dorsal structures in the rostral brain were hypoplastic and lacked morphological 
differentiation of medial and paramedial dorsal structures. This was most striking in the forebrain and 
midbrain (Fig. 14B), but abnormalities persisted into the hindbrain and spinal cord. As in the 
hemizygotes, the anatomy of the rest of the body in the El 2.5 -/- embryos was apparently normal. 
15 To characterize the patterning of the mutant brains, the expression of genes that play 

important roles in regionalization was analyzed (Marin and Rubenstein (2002) In Mouse Development, 
(ed. J. Rossant and P. Tarn), pp. 75-106: Academic Press). The analysis was focused mainly on the 
telencephalon of E12.5 -A embryos (Fig. 15). The lateral walls of the telencephalic vesicles primarily 
consist of the basal ganglia (rostroventral) and the cerebral cortex (caudodorsal). The rostral and 
20 rostrodorsal midline is constituted by the commissural plate and adjacent parts of the septal area; the 
caudodorsal midline consists of the choroid plexus and the cortical hem. The cortical hem is a Wnt- 
and BMP- rich signaling center in the dorsomedial telencephalon that has been shown to be crucial in 
cortical development (Furuta et aL 9 (1997) Development 124, 2203-12; Galceran et at., (2000) 
Development 127, 469-82; Grove et a/., (1998) Development 125, 23 15-25; Lee et al, (2000) 
25 Development 127, 457-67). 

Expression of the telencephalic marker Foxgl (Bfl) was maintained in the cortex and basal 
ganglia of RFX4_v3 mutants. The expression of markers specific for midline structures, the cerebral 
cortex and the basal ganglia revealed that the principal telencephalic defects in RFX4_v3 mutants 
involved severe hypoplasia of the dorsal midline and adjacent cerebral cortex (Fig. 15). The lack of 
30 dorsal midline structures was demonstrated by the loss of Wnt3a> Wnt7b and Bmp4 expression in the 
hem (Fig. 15E, F and not shown) and the reduction of Msx2 expression in the hem and choroid plexus 
(Fig. 1 5D). The cerebral cortex was present, based on the expression of Wnt7b, Emxl, Pax6 and Lhx2 
(Fig. 15F-I); however, it was severely hypoplastic. Despite the severe hypoplasia, the cortex did 
produce post-mitotic cells, based on the mantle zone expression of Wntlb (Fig. 15F). 
35 In wild-type mice, Lhx2 and Emxl are expressed in a dorsoventral gradient in the cortical 

neuroepithelium. In the RFX4_v3 mutants, Lhx2 and Emxl expression levels were similar to those 
seen in the ventral part of the normal cortex, suggesting that dorsal parts of the cortex were missing 
(Fig. 15G,I). An Eiwjci-negative, L/zx2-positive territory intercalated between the striatum and the 
prospective piriform cortex, which develops into parts of the claustroamygdaloid complex (Puelles et 
40 al, (2000) J Comp Neurol 424, 409-38; Yun etal t (2001) Development 128, 193-205), was maintained 

70 



WO 03/088919 




PCT/US03/12348 



in the mutants (Fig. 15G,I). Finally, Pax6 is normally detected in a ventrodorsal gradient. In the 
mutants, the ventral stronger-expressing area was detected (Fig. 15H). Thus, the most ventral 
subdivisions of the cortex, located adjacent to the striatum, i.e., the piriform cortex and parts of the 
claustroamygdaloid complex, seemed to be correctly specified, while the most medial cortical 
5 subdivisions, located adjacent to the cortical hem, i.e., the hippocampus and the neocortex, are either 
severely reduced, lost, or mis-specified. 

The basal ganglia are formed in mammals by the lateral ganglionic eminence, which develops 
into the striatum, and the medial ganglionic eminence, which develops into the pallidum (Marin and 
Rubenstein (2002) In Mouse Development, (ed. J. Rossant and P. Tarn), pp. 75-106: Academic Press). 
10 In the mutants, while the size of the basal ganglia was disproportionately large compared to the cortex, 
it is unclear whether or not there was an absolute increase in the sizes of the lateral and medial 
ganglionic eminences. The RFX4_v3 mutants exhibited normal expression of Dlx2 and Six3 
transcription factors in the lateral and medial ganglionic eminences (Fig. 15J,K). Expression of Oix2, 
Fgf8 and Six3 in the septum, a basal ganglia-related structure, was detected as well (Fig. 15B,C,J). In 
15 addition, the specific expression of the transcription factor Nkx2. 7 in the medial ganglionic eminence 
and ventral septum was apparently normal in the mutants (Fig. 15L). 

Example 11 
Other embodiments 

In some embodiments, an isolated and purified nucleic acid comprises a sequence encoding a 
20 protein selected from the group consisting of SEQ ID NOS: 5, 7 and 9. Alternatively, the nucleic acid 
sequence is selected from the group consisting of SEQ ID NOS: 5, 7, and 9 and variants thereof that are 
at least 90% identical. In some embodiments, the present disclosure provides nucleic acid sequence 
selected from the group consisting of SEQ ID NOS: 5, 7, and 9 and variants thereof that are at least 
80% identical. In some embodiments, the nucleic acid sequence is selected from the group consisting 
25 of SEQ ID NOS: 5, 7, and 9 and variants thereof that are at least 70% identical. The nucleic acid 
sequence may be operably linked to a heterologous promoter (e.g., SEQ ID NOS: 1 1 or 12). The 
nucleic acid sequence may be contained within a vector, and the vector may be present within a host 
cell. 

In other embodiments, an isolated and purified nucleic acid sequence hybridizes under 
30 conditions of low stringency to a nucleic acid selected from the group consisting of SEQ ID NOS: 5, 7, 
and 9. In some embodiments, the nucleic acid sequence encodes a protein (eg., SEQ ID NOS: 6, 8, or 
10, or is included in a vector that includes the nucleic acid sequence. The vector may be within a host 
cell, and the host cell may, for example, be located in an organism such as a plant, an animal, or a 
prokaryote. 

35 In yet other embodiments, the protein is selected from the group consisting of SEQ ID NOS: 

6, 8, and 10 and variants thereof that are at least 90% identical to SEQ ID NOS: 6, 8, or 10 and wherein 
the protein has at least one activity of RFX4__v3. In other embodiments, the present disclosure 
provides a protein selected from the group consisting of SEQ ID NOS: 6, 8, and 10 and variants thereof 
that are at least 80% identical to SEQ ID NOS: 6, 8, or 10 and wherein the protein has at least one 

40' activity of RFX4_v3. In other embodiments, the present disclosure provides a protein selected from 
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the group consisting of SEQ ED NOS: 6, 8, and 10 and variants thereof that are at least 70% identical to 
SEQ ID NOS: 6, 8, or 10 and wherein the protein has at least one activity of RFX4_v3. 

Various modifications and variations of the described method and system of the disclosure 
will be apparent to those skilled in the art without departing from the scope and spirit of the disclosure. 
Although the disclosure has been described in connection with specific preferred embodiments, it 
should be understood that the disclosure as claimed should not be unduly limited to such specific 
embodiments. Indeed, various modifications of the described modes for carrying out the disclosure, 
which are obvious to those skilled in the relevant fields, are intended to be within the scope of the 
following claims. 
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We claim: 

1 . A substantially purified RFX4_v3 polypeptide. 

5 

2. The polypeptide of claim 1, wherein the polypeptide comprises: 

a) an amino acid sequence at least 70% identical to an amino acid sequence set 
forth as SEQ ID NO: 8; 

b) a conservative variant of the amino acid sequence set forth as SEQ ID NO: 

10 8; or 

c) the amino acid sequence set forth as SEQ ID NO: 8, 

wherein the polypeptide has RFX4_v3 activity, and the N-terminus of the 
polypeptide is at least 90% identical to residues 1-14 of SEQ ID NO: 8. 

15 3. The polypeptide of claim 2, wherein the polypeptide comprises an amino acid 

sequence set forth as SEQ ID NO: 6, or SEQ ID NO: 10. 

4. The polypeptide of claim 2, wherein the polypeptide comprises an amino acid 
sequence set forth as SEQ ID NO: 8, or a sequence having at least 95% sequence identity to SEQ ID 

20 NO: 8. 

5. An isolated nucleic acid molecule encoding the polypeptide of claim 2. 

6. The nucleic acid of claim 5, wherein the nucleic acid molecule comprises: 

25 a nucleic acid sequence at least 70% identical to the nucleic acid sequence set forth 

as SEQ ED NO: 37 

7. The nucleic acid of claim 6, wherein the nucleic acid sequence is at least 90% 
identical to SEQ ID NO: 38 or SEQ ID NO: 39. 

30 

8. The nucleic acid of claim 6, wherein the nucleic acid sequence is at least 90% 
identical to SEQ ID NO: 37. 

9. The nucleic acid sequence of claim 5, wherein the nucleic acid sequence is operably 
3 5 linked to a heterologous promoter. 

10. The nucleic acid sequence of claim 5, wherein the heterologous promoter comprises 
SEQ ED NO: 11 or SEQ ID NO: 12. 

40 11. A vector comprising the nucleic acid of claim 5 . 
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12. A host cell transformed with the vector of claim 1 1 . 

13. The host cell of claim 12, wherein the host cell is a plant cell, an animal cell, or a 
5 prokaryotic cell. 

14. A composition comprising the polypeptide of claim 2. 

15. An isolated nucleic acid molecule that hybridizes under conditions of low stringency 
10 to a target nucleic acid molecule selected from the group consisting of nucleotides 1-42 of SEQ ID NO: 

37, SEQ ID NO: 38, and SEQ ID NO: 39, wherein the isolated nucleic acid molecule is at least 15 
nucleotides in length. 

16. The isolated nucleic acid molecule of claim 15, that hybridizes under conditions of 
1 5 high stringency to the target nucleic acid molecule. 

17. The nucleic acid of claim 1 5, wherein the target nucleic acid molecule encodes a 
RFX4_v3 polypeptide. 

20 18. The nucleic acid of claim 12, wherein the RFX4_v3 polypeptide comprises SEQ ID 

NO: 6, SEQ ID NO: 8, or SEQ ID NO: 10. 

19. A vector comprising the nucleic acid of claim 15. 

25 20. A host cell transformed with the vector of claim 1 9 . 

21. The host cell of claim 20, wherein the host cell is a plant cell, an animal cell, or a 
prokaryotic cell. 

30 22. The polypeptide of claim 2, wherein the RFX4_v3 activity comprises inhibiting the 

phenotypic expression of congenital hydrocephalus. 

23. The polypeptide of claim 2, wherein the activity is the ability to bind to RFX4_v3 
specific antibodies. 

35 

24. The polypeptide of claim 2, wherein the polypeptide comprises the amino acid 
residues set forth in SEQ ID NO: 33, SEQ ID NO: 34, or SEQ ID NO: 35. 

25. A method for producing a variant of a RFX4_v3 polypeptide, wherein the method 
40 comprises: 
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mutagenizing the wild-type nucleic acid sequence of SEQ ID NO: 37, SEQ ID NO: 
38, or SEQ ID NO: 39; and 

screening the variant for a RFX4_v3 activity. 

26. A composition comprising a nucleic acid molecule that inhibits the binding of the 
first 42 nucleotides of SEQ ID NO: 37, SEQ ID NO: 38, or SEQ ID NO: 39 to its complementary 
sequence. 

27. A polynucleotide sequence comprising at least fifteen nucleotides capable of 
hybridizing under stringent conditions to an isolated nucleotide sequence to nucleotides 1-42 of SEQ 
ID NO: 37. 

28. A method for detecting a nucleic acid molecule in a biological sample, wherein the 
nucleic acid molecule encodes a RFX4_v3 polypeptide, the method comprising: 

hybridizing a polynucleotide to the nucleic acid molecule to produce a hybridization complex, 
wherein the polynucleotide hybridizes to nucleotides 1-42 of SEQ ID NO: 37, SEQ ID NO: 38, or SEQ 
ID NO: 39; 

detecting the hybridization complex, wherein the presence of the hybridization complex 
indicates the presence of a polynucleotide encoding RFX4_y3 in the biological sample. 

29. The method of claim 28, wherein the polynucleotide hybridizes to SEQ ID NO: 37. 

30. The method of claim 30, further comprising amplifying the nucleic acid prior to 
hybridizing with the polynucleotide. 

31. A method of identifying a subject at risk of developing RFX4_v3 linked 
hydrocephalus, comprising detecting in the subject an abnormality in a RFX4_v3 polypeptide or in a 
RFX4_v3 nucleotide sequence that alters expression of the RFX4__v3. 

32. The method of claim 31, wherein detecting an abnormality comprises detecting a 
mutation in a nucleic acid sequence that encodes RFX4_v3, wherein the mutation is associated with 
RFX4__v3 linked hydrocephalus. 

33 . The method of claim 3 1 , wherein detecting an abnormality in the nucleic acid 
comprises performing a hybridization analysis with a nucleic acid probe that detects the mutation in the 
RFX4_v3 nucleic acid sequence. 

34. The method of claim 31, wherein detecting an abnormality comprises identifying an 
individual carrying a mutated RFX4_v3 allele, wherein the method comprises: 
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providing a nucleic acid from a subject, wherein the nucleic acid comprises a RFX4_v3 allele; 

and 

detecting a mutation in the nucleic acid that results in phenotypic expression of congenital 
hydrocephalus. 

5 

35. The method of claim 34, wherein the mutation is in the RFX4_v3 allele. 

36. The method of claim 31, wherein the method comprises detecting an abnormality in a 
RFX4_v3 polypeptide. 

10 

37. The method of claim 36, wherein the method comprises detecting an abnormality in 
expression of the RPX4_v3 polypeptide. 

38. The method of claim 37, wherein the abnormality in expression comprises detecting 
15 a reduced expression of the RFX4_v3 polypeptide. 

39. The method of claim 3 6, wherein the method comprises providing a polypeptide 
from a subject, and detecting a mutation in the polypeptide sequence, wherein the mutation results in 
phenotypic expression of congenital hydrocephalus. 

20 

40. The method of claim 31, comprising obtaining a biological sample from the subject, 
and detecting in the biological sample the abnormality in the RFX4_v3 polypeptide or in the RFX4_y3 
nucleotide sequence. 

25 41. The method of claim 40, wherein the biological sample comprises blood, amniotic 

fluid, plasma, or cerebral spinal fluid. 

42. The method of claim 40, wherein the method comprises: 

providing a polypeptide from a subject, wherein the polypeptide comprises a gene product of a 
30 RFX4_vl gene; and 

detecting a mutation in the polypeptide sequence, 

wherein the mutation results in phenotypic expression of congenital hydrocephalus. 

43. The method of claim 42, wherein detecting the mutation in the polypeptide sequence 
35 comprises detecting an abnormal protein or level or protein expression using RFX4_vl specific 

antibodies. 

44. A kit for determining if a subject is a carrier of a mutated RFX4_v3 gene, wherein 
the kit comprises: 

40 a reagent that specifically detects a mutation in a RFX4_v3 allele, and 
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instructions for determining whether the subject is at increased risk of expressing congenital 
hydrocephalus if the reagent specifically detects the mutation. 

45. The kit of claim 44, wherein the reagent comprises a nucleic acid probe that 

5 specifically hybridizes under stringent conditions to a nucleic acid sequence of SEQ ID NO: 37, SEQ 
ID NO: 38 or SEQ ID NO: 39. 

46. The kit of claim 44, wherein the reagent comprises an antibody that specifically binds 
the protein expressed by the RFX4_v3 allele. 

10 

47. A method for generating antibodies specific for an RFX4_v3 polypeptide, wherein 
the method comprises injecting an animal with an RFX4__v3 polypeptide or an immunogenic portion 
thereof. 

1 5 48. The method of claim 47, further comprising preparing a hybridoma that expresses the 

monoclonal antibody. 

49. An RFX4_v3 specific antibody for use as a detection or therapeutic agent. 

20 50. A method for generating a non-human transgenic animal with a knockout for the 

RFX4_v3 gene, wherein the method comprises disrupting an RFX4_v3 transcript, the disruption being 
sufficient to produce hydrocephalus in the transgenic animal. 

5 1 . The method of claim 50, wherein the non-human transgenic animal is a mouse. 

25 

52. The method of claim 50, wherein disrupting a RFX4_v3 transcript comprises: 
deleting or substituting any portion of the RFX4_v3 transcript, 

inserting an exogenous gene into the RFX4_v3 transcript, or 
any combination thereof. 

30 

53. The method of claim 50, wherein disrupting the RFX4_v3 transcript comprises 
crossing one non-human transgenic animal with a second non-human transgenic animal. 

54. A transgenic mouse whose somatic and germ cells comprise a disrupted endogenous 
35 RFX4_v3 gene, the disruption being sufficient to produce an increased susceptibility to developing 

congenital hydrocephalus. 

55. The transgenic mouse of claim 54, wherein the disrupted gene is introduced into the 
mouse of an ancestor of the mouse at an embryonic stage, wherein the mouse, if homozygous for the 

40 disrupted gene, does not reproduce. 



77 



WO 03/088919 



PCT/US03/12348 



56. The transgenic mouse of claim 54, wherein the disruption is an insertion within the 
RFX4_v3 gene. 

57. The composition of claim 54, wherein the disruption is a deletion or substitution 
within the RFX4_v3 gene. 

58. A method for screening compounds for the ability to alter RFX4_y3 activity, wherein 
the method comprises: 

a) providing: 

i) a first polypeptide sequence comprising at least a portion of RFX4_v3, 

ii) a second polypeptide sequence comprising at least a portion of a protein 
known to interact with RFX4_v3, and 

iii) one or more test compounds; and 

b) combining in any order the first polypeptide sequence comprising at least a portion of 
RFX4_v3, the second polypeptide sequence comprising at least a portion of a protein known to interact 
with RFX4_v3, and one or more test compounds under conditions such that the first polypeptide 
sequence, the second polypeptide sequence, and the test compound interact; and 

c) detecting the presence or absence of an interaction between the polypeptide sequence 
comprising at least a portion of RFX4_jv3 and the polypeptide sequence comprising at least a portion of 
a protein known to interact with RFX4_v3. 

59. A pharmaceutical composition for treating congenital hydrocephalus comprising: 

a) a therapeutically effective amount of a RFX4_v3 nucleic acid, polypeptide, or a 
therapeutically effective variant or portion thereof; and 

b) a pharmaceutically acceptable carrier. 

60. A pharmaceutical composition for preventing congenital hydrocephalus comprising: 

a) a RFX4_v3 nucleic acid, polypeptide, a variant, or a portion thereof, and 

b) a pharmaceutically acceptable carrier. 

61. A method of treating congenital hydrocephalus in a subject, comprising 
administering to the subject a therapeutically effective amount of an agent that increases presence of a 
RFX4_v3 polypeptide in the brain of the subject. 

62. The method of claim 61, wherein the method comprises administering exogenous 
RFX4_v3 polypeptide to the subject. 

63. The method of claim 61, wherein the method comprises increasing expression of 
RFX4_v3 polypeptide in the subject. 
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64. The method of claim 63, wherein the method comprises introducing into the subject a 
vector that expresses the RFX4_v3 polypeptide in the subject. 
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Exon 1 

Human MHC^IiliEElPDMOSTESWIERCLNESENKRYSSHTSLGNVSNDENEEKENNRASKPHSTPA 60 

Mouse » ^GGI^EEPDMDST E SWIERCLNESEN KRY S S H TS LGNVS N DENE E KEN NRAS K P H S T PA 60 

Zebrafish MLCGbLEMBMDSTESWIERCLNESESKRFSSHSSIGNISNDENEEKENNRASKPHSTPA 60 

* ************************ ** .*** .* .** .********************* 



Exons 2-5 



TLQVfLEENYEI AEGVCI PRSALYMHYLDFCEKNDTQPVNAASFGKI IRQQFPQLTTRRLG 

TLQWLEENYEIAEGVCIPRSALYMHYLDFCEKNDTQPVNAASFGKIIRQQFPQLTTRRLG 

TLQWLEENYEIAEGVCIPRIALYMHYLDFCEKLDSQPVNAASFGKIIRQQFPQLTTRRLG 
******************* ************ * ************************** 



T — RGQS KYHYYGI AVF ESSQY YDVMYSKKGAAWVSETGKKEVSKQTVAYS PRSKLGTLL 
TGTRGQSKYHY<YG1^ 

* **************!****************** ************************ 



120 
120 
120 



178 
180 
178 



DBD 



* .**************** 



i ^Ijjgjjjfe^^ 

************************************ t 



mSS* 238 

IPB 240 B 
$P 238 
***** 



******************* 



1 



i &M!!^mp^qalp#^ 
* »****★★*****★★★★★★*★★*★*★***★★'★*******★★ 



298 

300 C 
298 



Exons 6—15 **************** 



***** *********************************** ** 




**************•*.** ***********.************.**.****.**** * 
*** .★*★★*★*★**★**★******»*★★*★*★★*★*★******★★**************, 



********* 



358 
360 
358 



418 
420 
418 



478 
480 
478 



B^SB^l^^ 538 
^*i|gg^ 540 

*- ** . .* .. . ** * .***** *. ** ** * **** 



DD 



jGi^^AMQSYTWSLTYTVTTAAGSPAENSQQLPCMIlN-THVPSSSVTHRIPVYPHRE 596 

; T^§^'P^AMQSYTWSLTYTVTTAAGSPAENSQQLPCMRS-THMPSSSVTHRIPVYSHRE 598 

T&XSA!EiGAVQSYTWSLTYTVTTSGGSPTEPGSQLSCMRGGPALHGSSSAHRMPVYPHRD 597 
* .* ******************* ***** ** *** . ** .**.*** ** . 



Exons 16-18 



EHGYTGSYNYGSYGNQHPHPMQSQYPALPHDTAISGPLHYAPYHRSSAQYPFNSPTSRME 656 

EHGYTGSYNYGSYGNQHPHPLQNQYPALPHDTAISGPLHYSPYHRSSAQYPFNSPTSRME 658 

EHGYTGSYNYSSYANQHHHAIQSQYSSLTHEAGLPTPLHYSSYHRTSAQYPLNSQMSRME 657 
********** a ** ^*** *.:*.**.:*.*::.:, **** - ^*** ****** .** **** 

PCLMSSTPRLHPTPVTPRWPEVPSANTCYTSPSVHSARYGNSSDMYTPLTTRRNSEYEHM 716 

PCLMSSTPRLHPTPVTPRWPEVPTANACYTSPSVHSTRYGNSSDMYTPLTTRRNSEYEHM 718 

SCLMSGSPLLHSSPVTPRWPDVPSANSCYSSPTVHASRYS-TGDMYSPLAPRRNSEYEHA 716 
**** .* ** ******** .** .** .** .** .** . •** , *** .** . ******** 



QHFPGFAYINGEASTGWAK 735 

QH FPGFAY INGEAS TGWAK 737 

QHFPGFAYINGEATTGWAK 735 
************* ****** 
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<110> The Government of the United States of America as 

represented by the Secretary of the Department of Health and 

Human Services 

Blackshear, Perry J. 

Zeldin, Darryl C. 

Graves, Joan P. 

Stumpo, Deborah J. 

<120> COMPOSITIONS AND METHODS FOR DIAGNOSTICS AND THERAPEUTICS FOR 
HYDROCEPHALUS 

<130> 4239-64828 

<150> 60/374,184 
<151> 2002-04-19 

<150> 60/388,266 
<151> 2002-06-13 

<160> 39 

<170> Patentln version 3.2 



<210> 1 

<211> 2188 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> CDS 

<222> (106) . . (1797) 



<400> 1 

tggagaggcc acagctgctg gcttcctggg cttctccaaa ctcctgtgtg tcgccactgc 60 

caccggcagg gagccaggag agagacagaa aggggctgag acaga atg ate aaa agg 117 

Met lie Lys Arg 
1 

aga gec cac cct ggt gcg gga ggc gac agg acc agg cct cga egg cgc 165 
Arg Ala His Pro Gly Ala Gly Gly Asp Arg Thr Arg Pro Arg Arg Arg 
5 10 15 20 

cgt tec act gag age tgg att gaa aga tgt etc aac gaa agt gaa aac 213 
Arg Ser Thr Glu Ser Trp He Glu Arg Cys Leu Asn Glu Ser Glu Asn 
25 30 35 

aaa cgt tat tec age cac aca tct ctg ggg aat gtt tct aat gat gaa 261 
Lys Arg Tyr Ser Ser His Thr Ser Leu Gly Asn Val Ser Asn Asp Glu 
40 45 50 

aat gag gaa aaa gaa aat aat aga gca tec aag ccc cac tec act cct 309 
Asn Glu Glu Lys Glu Asn Asn Arg Ala Ser Lys Pro His Ser Thr Pro 
55 60 65 

get act ctg caa tgg ctg gag gag aac tat gag att gca gag ggg gtc 357 
Ala Thr Leu Gin Trp Leu Glu Glu Asn Tyr Glu He Ala Glu Gly Val 
70 75 80 
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tgc ate cct cgc agt gec etc tat atg cat tac ctg gat ttc tgc gag 4 05 

Cys lie Pro Arg. Ser Ala Leu Tyr Met His Tyr Leu Asp Phe Cys Glu 
85 90 95 100 

aag aat gat acc caa cct gtc aat get gec age ttt gga aag ate ata 4 53 

Lys Asn Asp Thr Gin Pro Val Asn Ala Ala Ser Phe Gly Lys lie lie 
105 110 115 

agg cag cag ttt cct cag tta acc acc aga aga etc ggg acc cga gga 501 
Arg Gin Gin Phe Pro Gin Leu Thr Thr Arg Arg Leu Gly Thr Arg Gly 
120 125 130 

cag tea aag tac cat tac tat ggc att gca gtg aaa gaa age tec caa 54 9 

Gin Ser Lys Tyr His Tyr Tyr Gly lie Ala Val Lys Glu Ser Ser Gin 
135 1 140 145 

tat tat gat gtg atg tat tec aag aaa gga get gec tgg gtg agt gag 597 
Tyr Tyr Asp Val Met Tyr Ser Lys Lys Gly Ala Ala Trp Val Ser Glu 
150 " 155 160 

acg ggc aag aaa gaa gtg age aaa cag aca gtg gca tat tea ccc egg 64 5 

Thr Gly Lys Lys Glu Val Ser Lys Gin Thr Val Ala Tyr Ser Pro Arg 
165 170 175 180 

tec aaa etc gga aca ctg ctg cca gaa ttt ccc aat gtc aaa gat eta 693 
Ser Lys Leu Gly Thr Leu Leu Pro Glu Phe Pro Asn Val Lys Asp Leu 
185 190 195 

aat ctg cca gee age ctg cct gag gag aag gtt tct acc ttt att atg 741 
Asn Leu Pro Ala Ser Leu Pro Glu Glu Lys Val Ser Thr Phe lie Met 
200 205 210 

atg tac aga aca cac tgt cag aga ata ctg gac act gta ata aga gee 789 
Met Tyr Arg Thr His Cys Gin Arg lie Leu Asp Thr Val lie Arg Ala 
215 220 225 

aac ttt gat gag gtt caa agt ttc ctt ctg cac ttt tgg caa gga atg 837 
Asn Phe Asp Glu Val Gin Ser Phe Leu Leu His Phe Trp Gin Gly Met 
230 235 240 

ccg ccc cac atg ctg cct gtg ctg ggc tec tec acg gtg gtg aac att 885 
Pro Pro His Met Leu Pro Val Leu Gly Ser Ser Thr Val Val Asn lie 
245 250 255 260 

gtc ggc gtg tgt gac tec ate etc tac aaa get ate tec ggg gtg ctg 933 
Val Gly Val Cys Asp Ser lie Leu Tyr Lys Ala lie Ser Gly Val Leu 
265 270 275 

atg ccc act gtg ctg cag gca tta cct gac age tta act cag gtg att 981 
Met Pro Thr Val Leu Gin Ala Leu Pro Asp Ser Leu Thr Gin Val lie 
280 285 290 

cga aag ttt gee aag caa ctg gat gag tgg eta aaa gtg get etc cac 102 9 

Arg Lys Phe Ala Lys Gin Leu Asp Glu Trp Leu Lys Val Ala Leu His 
295 " 300 305 

gac etc cca gaa aac ttg cga aac ate aag ttc gaa ttg teg aga agg 1077 
Asp Leu Pro Glu Asn Leu Arg Asn lie Lys Phe Glu Leu Ser Arg Arg 
310 315 ~ 320 
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ttc tec caa att ctg aga egg caa aca tea eta aat cat etc tgc cag 1125 
Phe Ser Gin lie Leu Arg Arg Gin Thr Ser Leu Asn His Leu Cys Gin 
325 330 335 340 

gca tct cga aca gtg ate cac agt gca gac ate acg ttc caa atg ctg 1173 
Ala Ser Arg Thr Val lie His Ser Ala Asp lie Thr Phe Gin Met Leu 
345 350 355 

gaa gac tgg agg aac gtg gac ctg aac age ate ace aag caa ace ctt 1221 
Glu Asp Trp Arg Asn Val Asp Leu Asn Ser He Thr Lys Gin Thr Leu 
360 " 365 370 

tac ace atg gaa gac tct cgc gat gag cac egg aaa etc ate ace caa 1269 
Tyr Thr Met Glu Asp Ser Arg Asp Glu His Arg Lys Leu He Thr Gin 
375 380 385 

tta tat cag gag ttt gac cat etc ttg gag gag cag tct ccc ate gag 1317 
Leu Tyr Gin Glu Phe Asp His Leu Leu Glu Glu Gin Ser Pro He Glu 
390 395 400 

tec tac att gag tgg ctg gat ace atg gtt gac cgc tgt gtt gtg aag 1365 
Ser Tyr He Glu Trp Leu Asp Thr Met Val Asp Arg Cys Val Val Lys 
405 410 415 420 

gtg get gee aag aga caa ggg tec ttg aag aaa gtg gee cag cag ttc 1413 
Val Ala Ala Lys Arg Gin Gly Ser Leu Lys Lys Val Ala Gin Gin Phe 
425 430 435 

etc ttg atg tgg tec tgt ttc ggc aca agg gtg ate egg gac atg ace 14 61 

Leu Leu Met Trp Ser Cys Phe Gly Thr Arg Val He Arg Asp Met Thr 
440 445 450 

ttg cac age gee ccc age ttc ggg tct ttt cac eta att cac tta atg 1509 
Leu His Ser Ala Pro Ser Phe Gly Ser Phe His Leu He His Leu Met 
455 460 465 

ttt gat gac tac gtg etc tac ctg tta gaa tct ctg cac tgt cag gag 1557 
Phe Asp Asp Tyr Val Leu Tyr Leu Leu Glu Ser Leu His Cys Gin Glu 
470 475 480 

egg gee aat gag etc atg cga gee atg aag gga gaa gga age act gca 1605 
Arg Ala Asn Glu Leu Met Arg Ala Met Lys Gly Glu Gly Ser Thr Ala 
485 490 495 500 

gaa gtc cga gaa gag ate ate ttg aca gag get gee gca cca ace cct 1653 
Glu Val Arg Glu Glu He He Leu Thr Glu Ala Ala Ala Pro Thr Pro 
505 510 515 

tea cca gtg cca teg ttt tct cca gca aaa tct gee aca tct gtg gaa 1701 
Ser Pro Val Pro Ser Phe Ser Pro Ala Lys Ser Ala Thr Ser Val Glu 
520 525 530 

gtg cca cct ccc tct tec cct gtt age aat cct tec cct gag tac act 174 9 

Val Pro Pro Pro Ser Ser Pro Val Ser Asn Pro Ser Pro Glu Tyr Thr 
535 540 545 

ggc etc age act aca ggt aat gga aag tec ttc aaa aac ttt ggg tag 17 97 

Gly Leu Ser Thr Thr Gly Asn Gly Lys Ser Phe Lys Asn Phe Gly 
550 555 560 

ttaatgtttg aagaaagggc tttctgccag cctgggcaac atagtgagac ttcatttcca 1857 
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cacacacaaa 


aagccagaca 


tcttggctca 


cacctgtagt 


cccagctact 


4- f-f fw f~r a ri f~1 r~* t* f"T 

^-y y 9 a y y^'-y 


1917 

X Z* -L 1 


aggtgggaga 


attgcttgag 


cccaggagct 


acgatcgcac 


—\ 4~ /r^-» za ^ 1" 
CaCT-yCdL. UO 


Lay uuy i» 


1977 


gatacagtga 


gaccttgtct 


caaaaaaaga 


aaaacagggc 


tttctggaaa 


aacattcttc 


2037 


tcccacaatc 


tccaaaagat 


aatgccaaaa 


cctgggtatc 


ttcctggatt 


tgtgaatgac 


2097 


gtacaggtat 


tcatttattc 


attggtacac 


attctgtatg 


ctgctgtttt 


caagttggca 


2157 


aattaagcat 


atgataaaat 


cccaaaacta 


a 






2188 



<210> 2 

<211> 563 

<212> PRT 

<213> Homo sapiens 

<400> 2 

Met He Lys Arg Arg Ala His Pro Gly Ala Gly Gly Asp Arg Thr Arg 
15 10 15 

Pro Arg Arg Arg Arg Ser Thr Glu Ser Trp lie Glu Arg Cys Leu Asn 
20 " 25 30 

Glu Ser Glu Asn Lys Arg Tyr Ser Ser His Thr Ser Leu Gly Asn Val 
35 40 45 

Ser Asn Asp Glu Asn Glu Glu Lys Glu Asn Asn Arg Ala Ser Lys Pro 
50 ~ 55 60 

His Ser Thr Pro Ala Thr Leu Gin Trp Leu Glu Glu Asn Tyr Glu He 
65 70 75 80 

Ala Glu Gly Val Cys He Pro Arg Ser Ala Leu Tyr Met His Tyr Leu 
85 90 95 

Asp Phe Cys Glu Lys Asn Asp Thr Gin Pro Val Asn Ala Ala Ser Phe 
100 105 HO 

Gly Lys He He Arg Gin Gin Phe Pro Gin Leu Thr Thr Arg Arg Leu 
115 ~ 120 125 

Gly Thr Arg Gly Gin Ser Lys Tyr His Tyr Tyr Gly He Ala Val Lys 
130 ~ * 135 140 

Glu Ser Ser Gin Tyr Tyr Asp Val Met Tyr Ser Lys Lys Gly Ala Ala 
145 150 155 160 
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Trp Val Ser Glu Thr Gly Lys Lys Glu Val Ser Lys Gin Thr Val Ala 
165 170 175 



Tyr Ser Pro Arg Ser Lys Leu Gly Thr Leu Leu Pro Glu Phe Pro Asn 
180 185 190 



Val Lys Asp Leu Asn Leu Pro Ala Ser Leu Pro Glu Glu Lys Val Ser 
195 200 205 



Thr Phe lie Met Met Tyr Arg Thr His Cys Gin Arg lie Leu Asp Thr 
210 215 220 



Val lie Arg Ala Asn Phe Asp Glu Val Gin Ser Phe Leu Leu His Phe 
225 230 235 240 



Trp Gin Gly Met Pro Pro His Met Leu Pro Val Leu Gly Ser Ser Thr 
245 250 255 



Val Val Asn lie Val Gly Val Cys Asp Ser lie Leu Tyr Lys Ala lie 
260 265 270 



Ser Gly Val Leu Met Pro Thr Val Leu Gin Ala Leu Pro Asp Ser Leu 
275 280 285 



Thr Gin Val lie Arg ■ Lys Phe Ala Lys Gin Leu Asp Glu Trp Leu Lys 
290 295 300 



Val Ala Leu His Asp Leu Pro Glu Asn Leu Arg Asn lie Lys Phe Glu 
305 310 315 320 



Leu Ser Arg Arg Phe Ser Gin lie Leu Arg Arg Gin Thr Ser Leu Asn 
325 330 335 



His Leu Cys Gin Ala Ser Arg Thr Val lie His Ser Ala Asp lie Thr 
3.40 345 350 



Phe Gin Met Leu Glu Asp Trp Arg Asn Val Asp Leu Asn Ser lie Thr 
355 360 365 



Lys Gin Thr Leu Tyr Thr Met Glu Asp Ser Arg Asp Glu His Arg Lys 
370 375 380 



Leu lie Thr Gin Leu Tyr Gin Glu Phe Asp His Leu Leu Glu Glu Gin 
385 390 395 400 



Ser Pro lie Glu Ser Tyr He Glu Trp Leu Asp Thr Met Val Asp Arg 
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405 410 415 

Cys Val Val Lys Val Ala Ala Lys Arg Gin Gly Ser Leu Lys Lys Val 
420 425 430 

Ala Gin Gin Phe Leu Leu Met Trp Ser Cys Phe Gly Thr Arg Val lie 
435 440 445 

Arg Asp Met Thr Leu His Ser Ala Pro Ser Phe Gly Ser Phe His Leu 
450 455 460 

lie His Leu Met Phe Asp Asp Tyr Val Leu Tyr Leu Leu Glu Ser Leu 
465 470 475 480 

His Cys Gin Glu Arg Ala Asn Glu Leu Met Arg Ala Met Lys Gly Glu 
485 490 495 

Gly Ser Thr Ala Glu Val Arg Glu Glu lie lie Leu Thr Glu Ala Ala 
500 505 510 

Ala Pro Thr Pro Ser Pro Val Pro Ser Phe Ser Pro Ala Lys Ser Ala 
515 520 525 

Thr Ser Val Glu Val Pro Pro Pro Ser Ser Pro Val Ser Asn Pro Ser 
530 535 540 

Pro Glu Tyr Thr Gly Leu Ser Thr Thr Gly Asn Gly Lys Ser Phe Lys 
545 - 550 555 560 

Asn Phe Gly 



<210> 3 

<211> 3382 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (110) . . (2035) 

<400> 3 

aggtgggaag gcagttatga cagttgagaa gtagtagaag acacggaagg cacagaaggc 60 

agacttcgct cagcacaaag aagaattttc tgataaccat actggcaaa atg aac tgg 118 

Met Asn Trp 
1 



get gec ttc gga ggg tct gaa ttc ttc ate cca gaa ggc att cag ata 



166 
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Ala Ala Phe Gly Gly Ser Glu Phe Phe lie Pro Glu Gly lie Gin lie 
5 ^10 15 

gat teg aga tgc cca eta age aga aat ate acg gaa tgg tac cat tac 
Asp Ser Arg Cys Pro Leu Ser Arg Asn lie Thr Glu Trp Tyr His Tyr 
20 25 30 35 

tat ggc att gca gtg aaa gaa age tec caa tat tat gat gtg atg tat 
Tyr Gly lie Ala Val Lys Glu Ser Ser Gin Tyr Tyr Asp Val Met Tyr 
4 0 45 50 

tec aag aaa gga get gee tgg gtg agt gag acg ggc aag aaa gaa gtg 
Ser Lys Lys Gly Ala Ala Trp Val Ser Glu Thr Gly Lys Lys Glu Val 
55 60 65 



ctg cca gaa ttt ccc aat gtc aaa gat eta aat ctg cca gee age ctg 
Leu Pro Glu Phe Pro Asn Val Lys Asp Leu Asn Leu Pro Ala Ser Leu 
85 90 95 



cac agt gca gac ate acg ttc caa atg ctg gaa gac tgg agg aac gtg 
His Ser Ala Asp lie Thr Phe Gin Met Leu Glu Asp Trp Arg Asn Val 



214 



262 



310 



age aaa cag aca gtg gca tat tea ccc egg tec aaa etc gga aca ctg 358 
Ser Lys Gin Thr Val Ala Tyr Ser Pro Arg Ser Lys Leu Gly Thr Leu 
70 75 80 



406 



cct gag gag aag gtt tct acc ttt att atg atg tac aga aca cac tgt 454 
Pro Glu Glu Lys Val Ser Thr Phe He Met Met Tyr Arg Thr His Cys 
100 ~ 105 110 115 

cag aga ata ctg gac act gta ata aga gee aac ttt gat gag gtt caa 502 
Gin Arg He Leu Asp Thr Val He Arg Ala Asn Phe Asp Glu Val Gin 
120 125 130 

agt ttc ctt ctg cac ttt tgg caa gga atg ccg ccc cac atg ctg cct 550 
Ser Phe Leu Leu His Phe Trp Gin Gly Met Pro Pro His Met Leu Pro 
135 140 145 

gtg ctg ggc tec tec acg gtg gtg aac att gtc ggc gtg tgt gac tec 598 
Val Leu Gly Ser Ser Thr Val Val Asn He Val Gly Val Cys Asp Ser 
150 155 160 

ate etc tac aaa get ate tec ggg gtg ctg atg ccc act gtg ctg cag 64 6 

He Leu Tyr Lys Ala He Ser Gly Val Leu Met Pro Thr Val Leu Gin 
165 170 175 

gca tta cct gac age tta act cag gtg att cga aag ttt gee aag caa 694 
Ala Leu Pro Asp Ser Leu Thr Gin Val He Arg Lys Phe Ala Lys Gin 
180 185 190 195 

ctg gat gag tgg eta aaa gtg get etc cac gac etc cca gaa aac ttg 742 
Leu Asp Glu Trp Leu Lys Val Ala Leu His Asp Leu Pro Glu Asn Leu 
200 205 210 

cga aac ate aag ttc gaa ttg teg aga agg ttc tec caa att ctg aga 7 90 

Arg Asn He Lys Phe Glu Leu Ser Arg Arg Phe Ser Gin He Leu Arg 
215 220 225 

egg caa aca tea eta aat cat etc tgc cag gca tct cga aca gtg ate 838 
Arg Gin Thr Ser Leu Asn His Leu Cys Gin Ala Ser Arg Thr Val He 
230 235 240 



886 
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245 250 255 

gac ctg aac age ate acc aag caa acc ctt tac acc atg gaa gac tct 934 
Asp Leu Asn Ser lie Thr Lys Gin Thr Leu Tyr Thr Met Glu Asp Ser 
260 265 270 275 

cgc gat gag cac egg aaa etc ate acc caa tta tat cag gag ttt gac 982 
Arg Asp Glu His Arg Lys Leu lie Thr Gin Leu Tyr Gin Glu Phe Asp 
280 285 290 



cat etc ttg gag gag cag tct ccc ate gag tec tac att gag tgg ctg 
His Leu Leu Glu Glu Gin Ser Pro He Glu Ser Tyr He Glu Trp Leu 
295 300 305 

gat acc atg gtt gac cgc tgt gtt gtg aag gtg get gee aag aga cga 
Asp Thr Met Val Asp Arg Cys Val Val Lys Val Ala Ala Lys Arg Arg 
310 315 320 



tac ctg tta gaa tct ctg cac tgt cag gag egg gee aat gag etc atg 
Tyr Leu Leu Glu Ser Leu His Cys Gin Glu Arg Ala Asn Glu Leu Met 
375 380 385 



1030 



1078 



ggg tec ttg aag aaa gtg gee cag cag ttc etc ttg atg tgg tec tgt 112 6 

Gly Ser Leu Lys Lys Val Ala Gin Gin Phe Leu Leu Met Trp Ser Cys 
325 330 335 

ttc ggc aca agg gtg ate egg gac atg acc ttg cac age gee ccc age 117 4 

Phe Gly Thr Arg Val He Arg Asp Met Thr Leu His Ser Ala Pro Ser 
340 J " 345 350 355 

ttc ggg tct ttt cac eta att cac tta atg ttt gat gac tac gtg etc 1222 
Phe Gly Ser Phe His Leu He His Leu Met Phe Asp Asp Tyr Val Leu 
360 365 370 



1270 



cga gee atg aag gga gaa gga age act gca gaa gtc cga gaa gag ate 1318 
Arg Ala Met Lys Gly Glu Gly Ser Thr Ala Glu Val Arg Glu Glu He 
390 ~ 395 400 

ate ttg aca gag get gee gca cca acc cct tea cca gtg cca teg ttt 1366 
He Leu Thr Glu Ala Ala Ala Pro Thr Pro Ser Pro Val Pro Ser Phe 
405 410 415 

tct cca gca aaa tct gee aca tct gtg gaa gtg cca cct ccc tct tec 1414 
Ser Pro Ala Lys Ser Ala Thr Ser Val Glu Val Pro Pro Pro Ser Ser 
420 " 425 430 435 

cct gtt age aat cct tec cct gag tac act ggc etc age act aca gga 14 62 

Pro Val Ser Asn Pro Ser Pro Glu Tyr Thr Gly Leu Ser Thr Thr Gly 
440 445 450 

gca atg cag get tac acg tgg tct eta aca tac aca gtg acg acg get 1510 
Ala Met Gin Ala Tyr Thr Trp Ser Leu Thr Tyr Thr Val Thr Thr Ala 
455 460 465 

get ggg tec cca get gag aac tec caa cag ctg ccc tgt atg agg aac 1558 
Ala Gly Ser Pro Ala Glu Asn Ser Gin Gin Leu Pro Cys Met Arg Asn 
470 475 480 

act cac gtg cct tct tec tec gtc aca cac agg ata cca gtt tat ccc 1606 
Thr His Val Pro Ser Ser Ser Val Thr His Arg He Pro Val Tyr Pro 
485 490 495 
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cac aga gag gaa cat gga tac acg gga age tat aac tat ggg age tat 
His Arg Glu Glu His Gly Tyr Thr Gly Ser Tyr Asn Tyr Gly Ser Tyr 
500 ~ 505 510 515 

ggc aac cag cat cct cac ccc atg cag age cag tat ccg gec etc cct 
Gly Asn Gin His Pro His Pro Met Gin Ser Gin Tyr Pro Ala Leu Pro 
520 525 530 

cat gac aca get ate tct ggg cca etc cac tat gee cct tac cac agg 
His Asp Thr Ala He Ser Gly Pro Leu His Tyr Ala Pro Tyr His Arg 
535 540 545 

age tct gca cag tac cct ttt aat age ccc act tec egg atg gaa cct 
Ser Ser Ala Gin Tyr Pro Phe Asn Ser Pro Thr Ser Arg Met Glu Pro 
550 ^ 555 560 

tgt ttg atg age agt act ccc aga ctg cat cct ace cca gtc act ccc 
Cys Leu Met Ser Ser Thr Pro Arg Leu His Pro Thr Pro Val Thr Pro 
565 570 575 

cgc tgg cca gag gtg ccc tea gee aac acg tgc tac aca aac ccg tct 
Arg Trp Pro Glu Val Pro Ser Ala Asn' Thr Cys Tyr Thr Asn Pro Ser 
580 585 590 595 

gtg cat tct gcg agg tac gga aac tct agt gac atg tat aca cct ctg 
Val His Ser Ala Arg Tyr Gly Asn Ser Ser Asp Met Tyr Thr Pro Leu 
600 605 610 

aca acg cgc agg aat tct gaa tat gag cac atg caa cac ttt cct ggc 
Thr Thr Arg Arg Asn Ser Glu Tyr Glu His Met Gin His Phe Pro Gly 
615 620 625 

ttt get tac ate aac gga gag gee tct aca gga tgg get aaa tga 
Phe Ala Tyr He Asn Gly Glu Ala Ser Thr Gly Trp Ala Lys 
630 635 640 



1654 



1702 



1750 



1798 



1846 



1894 



1942 



1990 



2035 



ctgetatcat 


aggcatccat 


atttaatatt 


aataataata 


attaataata 


ataataaacc 


2095 


caacacccat 


cccccagaag 


actttatctc 


tatacattgt 


aactcatggg 


ctattcctaa 


2155 


gtgcccattt 


tcctaatgaa 


catgaggatg 


ggatcaatgt 


gggatgaata 


aactttagtt 


2215 


cagaaacagg 


acttactaaa 


agtcagtggg 


actgggtttc tgtagccaag ccagacttga 


2275 


ctgtttctgt 


agagcactat 


ctegggcagg 


ccattctgtg 


ccttttccct 


ctgttccatg 


2335 


actttgettt 


gtgttggcaa 


ccacttctag 


taagctactg 


attttcctgt 


tgacaaaatc 


2395 


tctttagtct 


tgaaggatgg 


atactggaga 


cagaatctgg 


tttgtgttct 


tggatgggca 


2455 


cataatttac 


caagagcatt 


caccttgcca 


tctgtcttgt 


cattgtactg tacaaggaac 


2515 


agccctcaga 


cgtgttctgc 


acatcccttc 


ttcctggtgg 


taccatccct 


atttcctgga 


2575 


gcaccagggc 


taaatgggga 


gctatctgga 


aactctagat 


tttctgtcat 


acccacatct 


2635 


gtcacagtac 


ctgcattgtc 


ttggaatgta 


agcactgtct 


tgagggaagg 


aagaggtctg 


2695 


ttctgtattg 


ccttaagttg 


attgaggttt 


gtaggagact 


ggttcttcta 


catacaagga 


2755 
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tttgtcttaa gtttgcacaa tggctagtgt cagcaaaagg caggagaggg tttttgtttt 2815 

ttttttaagt tctatgagaa tgtggattta tggcattgag tatcacactc agctctgctg 2875 

tgttaacttt gtgaaactgg atggaacaaa ctttaactta ccaagcacca agtgtgaaag 2935 

tgactttcac ggttccttca taaaactata ataatatccg acactttgat agaaaaaaat 2995 

tcaaagctgt gcctttgagc ctatactata ctgtgtatgt gtggaaataa aaatgtattg 3055 

tacttttgga gaattttttg taggcatttt tctgtcagat ttgtagtaat ttgtgaggtt 3115 

tgttagagat taatataggt tttctttctg tattataaaa tgcaccaagc aattatggtg 3175 

gacctattac cctatgggta agaaataaat ggaaatatga catcggatgt ttcagcaact 3235 

gttctgtaaa taaaatcttt gatcacacca ctcagtgtga taattgtgtc tacagctaaa 3295 

atggaaatag ttttatctgt acagttgtgc aagatatgaa tggtttcaca ctcaaataaa 3355 

aaatattgaa cccccaaaaa aaaaaaa 3382 

<210> 4 

<211> 641 

<212> PRT 

<213> Homo sapiens 

<400> 4 

Met Asn Trp Ala Ala Phe Gly Gly Ser Glu Phe Phe lie Pro Glu Gly 
15 10 15 

He Gin He Asp Ser Arg Cys Pro Leu Ser Arg Asn lie Thr Glu Trp 
20 25 30 

Tyr His Tyr Tyr Gly He Ala Val Lys Glu Ser Ser Gin Tyr Tyr Asp 
35 40 45 

Val Met Tyr Ser Lys Lys Gly Ala Ala Trp Val Ser Glu Thr Gly Lys 
50 ** 55 60 

Lys Glu Val Ser Lys Gin Thr Val Ala Tyr Ser Pro Arg Ser Lys Leu 
65 70 75 80 

Gly Thr Leu Leu Pro Glu Phe Pro Asn Val Lys Asp Leu Asn Leu Pro 
85 90 95 

Ala Ser Leu Pro Glu Glu Lys Val Ser Thr Phe lie Met Met Tyr Arg 
100 105 HO 

Thr His Cys Gin Arg He Leu Asp Thr Val He Arg Ala Asn Phe Asp 
115 120 125 
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Glu Val Gin Ser Phe Leu Leu His Phe Trp Gin Gly Met Pro Pro His 

130 135 140 

Met Leu Pro Val Leu Gly Ser Ser Thr Val Val Asn lie Val Gly Val 

145 150 155 160 



Cys Asp Ser lie Leu Tyr Lys Ala lie Ser Gly Val Leu Met Pro Thr 
165 170 175 



Val Leu Gin Ala Leu Pro Asp Ser Leu Thr Gin Val lie Arg Lys Phe 
180 185 190 



Ala Lys Gin Leu Asp Glu Trp Leu Lys Val Ala Leu His Asp Leu Pro 
195 200 205 



Glu Asn Leu Arg Asn lie Lys Phe Glu Leu Ser Arg Arg Phe Ser Gin 
210 215 220 



lie Leu Arg Arg Gin Thr Ser Leu Asn His Leu Cys Gin Ala Ser Arg 
225 ~ 230 235 240 



Thr Val lie His Ser Ala Asp lie Thr Phe Gin Met Leu Glu Asp Trp 
245 250 255 



Arg Asn Val Asp Leu Asn Ser lie Thr Lys Gin Thr Leu Tyr Thr Met 
260 265 270 



Glu Asp Ser Arg Asp Glu His Arg Lys Leu lie Thr Gin Leu Tyr Gin 
275 280 285 



Glu Phe Asp His Leu Leu Glu Glu Gin Ser Pro He Glu Ser Tyr He 
290 295 300 



Glu Trp Leu Asp Thr Met Val Asp Arg Cys Val Val Lys Val Ala Ala 
305 * ^ 310 315 320 



Lys Arg Arg Gly Ser Leu Lys Lys Val Ala Gin Gin Phe Leu Leu Met 
325 330 335 



Trp Ser Cys Phe Gly Thr Arg Val He Arg Asp Met Thr Leu His Ser 
340 345 350 



Ala Pro Ser Phe Gly Ser Phe His Leu He His Leu Met Phe Asp Asp 
355 360 365 
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Tyr Val Leu Tyr Leu Leu Glu Ser Leu His Cys Gin Glu Arg Ala Asn 
370 375 380 



Glu Leu Met Arg Ala Met Lys Gly Glu Gly Ser Thr Ala Glu Val Arg 
385 390 395 400 

Glu Glu lie lie Leu Thr Glu Ala Ala Ala Pro Thr Pro Ser Pro Val 
405 410 415 



Pro Ser Phe Ser Pro Ala Lys Ser Ala Thr Ser Val Glu Val Pro Pro 
420 425 430 



Pro Ser Ser Pro Val Ser Asn Pro Ser Pro Glu Tyr Thr Gly Leu Ser 
435 440 445 



Thr Thr Gly Ala Met Gin Ala Tyr Thr Trp Ser Leu Thr Tyr Thr Val 
450 455 460 



Thr Thr Ala Ala Gly Ser Pro Ala Glu Asn Ser Gin Gin Leu Pro Cys 

465 470 475 480 

Met Arg Asn Thr His Val Pro Ser Ser Ser Val Thr His Arg He Pro 

485 490 495 



Val Tyr Pro His Arg Glu Glu His Gly Tyr Thr Gly Ser Tyr Asn Tyr 
500 " 505 510 



Gly Ser Tyr Gly Asn Gin His Pro His Pro Met Gin Ser Gin Tyr Pro 
515 520 525 

Ala Leu Pro His Asp Thr Ala He Ser Gly Pro Leu His Tyr Ala Pro 
530 535 540 



Tyr His Arg Ser Ser Ala Gin Tyr Pro Phe Asn Ser Pro Thr Ser Arg 
545 550 555 560 



Met Glu Pro Cys Leu Met Ser Ser Thr Pro Arg Leu His Pro Thr Pro 
565 570 575 



Val Thr Pro Arg Trp Pro Glu Val Pro Ser Ala Asn Thr Cys Tyr Thr 
580 585 590 



Asn Pro Ser Val His Ser Ala Arg Tyr Gly Asn Ser Ser Asp Met Tyr 
595 600 605 

Thr Pro Leu Thr Thr Arg Arg Asn Ser Glu Tyr Glu His Met Gin His 
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610 615 620 

Phe Pro Gly Phe Ala Tyr lie Asn Gly Glu Ala Ser Thr Gly Trp Ala 
625 ^ 630 635 640 

Lys 



<210> 5 

<211> 2842 

<212> DNA 

<213> Mus musculus 

<220> 

<221> CDS 

<222> (307) . . (2520) 
<400> 5 

ttttgacggg tttggctttg cccgactgga ttactgagtg tcccctcgct cgttcgctcg 60 

ccctctcgct ctctccttca gctctagctt cdttccttcc ctcgcttctt cgcctctttt 120 

ctttccacta gttctttctt ttcccctttt atccttttgc cctctcaccc accgtctccc 180 

cctctctctc tcgctatccc ttccttcctt atttcttccc tcccttcctc cctgggcatc 24 0 

tctagcacag gggatcccca aatatcagga cttttggggg gcgtctgtgc tgtccatggg 300 

aagagc atg cat tgt ggg tta ctg gag gaa ccc gac atg gat tec aca 34 8 

Met His Cys Gly Leu Leu Glu Glu Pro Asp Met Asp Ser Thr 
1 " 5 10 

gag age tgg att gaa aga tgt etc aat gaa age gag aat aaa cgc tat 396 
Glu Ser Trp He Glu Arg Cys Leu Asn Glu Ser Glu Asn Lys Arg Tyr 
15 20 25 30 

tec agt cac aca tct ctg ggg aat gtg tct aat gat gaa aat gag gaa 444 
Ser Ser His Thr Ser Leu Gly Asn Val Ser Asn Asp Glu Asn Glu Glu 
35 40 45 



aaa gaa aat aac aga gca tec aag ccc cac tec acg ccg gee acc ctg 
Lys Glu Asn Asn Arg Ala Ser Lys Pro His Ser Thr Pro Ala Thr Leu 
50 55 60 



act cag cct gtc aat get gec age ttt ggg aag ate ata agg cag cag 
Thr Gin Pro Val Asn Ala Ala Ser Phe Gly Lys He He Arg Gin Gin 
95 100 105 HO 

ttt cct cag eta acc acc aga aga etc ggg acc ggg acc cga gga cag 



492 



caa tgg ctg gag gaa aac tat gag att get gag ggc gtc tgc ate ccc 54 0 

Gin Trp Leu Glu Glu Asn Tyr Glu He Ala Glu Gly Val Cys He Pro 
65 70 75 

cgc age gec etc tac atg cac tac ctg gat ttc tgt gag aag aac gac 588 

Arg Ser Ala Leu Tyr Met His Tyr Leu Asp Phe Cys Glu Lys Asn Asp 

80 85 90 



636 



684 
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Phe Pro Gin Leu Thr Thr Arg Arg Leu Gly Thr Gly Thr Arg Gly Gin 
115 120 125 

tea aag tac cat tac tat ggc ata gcg gtg aag gag age tec cag tat 732 

Ser Lys Tyr His Tyr Tyr Gly lie Ala Val Lys Glu Ser Ser Gin Tyr 

130 135 140 

tat gat gtg atg tac tea aag aaa gga get gee tgg gtg age gag acg 780 

Tyr Asp Val Met Tyr Ser Lys Lys Gly Ala Ala Trp Val Ser Glu Thr 
145 " 150 155 

ggc aag aga gaa gtc acc aag cag acg gtg gca tat tct ccc egg tec 828 
Gly Lys Arg Glu Val Thr Lys Gin Thr Val Ala Tyr Ser Pro Arg Ser 
160 ~ 165 170 



aag ctt ggg aca ttg ctg cca gac ttt cca aac gtc aaa gac eta aat 
Lys Leu Gly Thr Leu Leu Pro Asp Phe Pro Asn Val Lys Asp Leu Asn 
175 180 185 190 

ctg cca gee agt ctt cct gag gag aag gtg tct acc ttt att atg atg 
Leu Pro Ala Ser Leu Pro Glu Glu Lys Val Ser Thr Phe He Met Met 
195 200 205 

tac aga aca cac tgt cag aga ata ctg gac act gta ata aga gee aac 
Tyr Arg Thr His Cys Gin Arg He Leu Asp Thr Val He Arg Ala Asn 
210 215 220 

ttt gat gag gtt caa agt ttc ctt ctg cac ttt tgg caa ggg atg ccg 
Phe Asp Glu Val Gin Ser Phe Leu Leu His Phe Trp Gin Gly Met Pro 
225 230 235 



876 



924 



972 



1020 



ccc cac atg ctg ccc gtg eta ggc tec tec acg gtg gtg aac ate gtg 1068 
Pro His Met Leu Pro Val Leu Gly Ser Ser Thr Val Val Asn He Val 
240 245 250 

ggt gtg tgt gac tec ate etc tac aaa gee ate tec ggt gtg ttg atg 1116 
Gly Val Cys Asp Ser He Leu Tyr Lys Ala He Ser Gly Val Leu Met 
255 " ~ 260 265 270 

ccc acg gtg ctg cag gcg ttg ccg gac age tta act cag gtg ate cga 1164 
Pro Thr Val Leu Gin Ala Leu Pro Asp Ser Leu Thr Gin Val He Arg 
275 280 285 

aag ttt gee aag cag ctg gac gag tgg ctg aaa gtg get etc cac gat 1212 
Lys Phe Ala Lys Gin Leu Asp Glu Trp Leu Lys Val Ala Leu His Asp 
290 295 300 

etc ccg gaa aac ctg aga aac ate aaa ttt gaa tta tea agg agg ttt 12 60 

Leu Pro Glu Asn Leu Arg Asn He Lys Phe Glu Leu Ser Arg Arg Phe 
305 310 315 

tec caa ate eta agg agg caa aca teg ctg aac cat ctg tgc cag gca 1308 
Ser Gin He Leu Arg Arg Gin Thr Ser Leu Asn His Leu Cys Gin Ala 
320 325 330 

tct cga acg gtg ate cac agt gca gac ate acg ttc cag atg ctg gag 1356 
Ser Arg Thr Val He His Ser Ala Asp He Thr Phe Gin Met Leu Glu 
335 340 345 350 



gac tgg agg aat gtg gac ctg agt age ate acc aag cag act ctg tat 
Asp Trp Arg Asn Val Asp Leu Ser Ser He Thr Lys Gin Thr Leu Tyr 



1404 
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355 360 365 

acc atg gag gac tct egg gat gag cac cgc aga etc ate ate cag ttg 14 52 

Thr Met Glu Asp Ser Arg Asp Glu His Arg Arg Leu lie lie Gin Leu 
370 375 380 

tac cag gag ttt gac cac ctg ctg gag gaa cag tec ccc ate gag tct 1500 
Tyr Gin Glu Phe Asp His Leu Leu Glu Glu Gin Ser Pro lie Glu Ser 
385 390 395 

tac ata gaa tgg ctg gat acc atg gta gac cga tgc gtt gta aag gtg 1548 
Tyr lie Glu Trp Leu Asp Thr Met Val Asp Arg Cys Val Val Lys Val 
400 405 410 

get gee aag aga caa" ggg tct ctg aag aaa gta gee caa cag ttc ctg 1596 
Ala Ala Lys Arg Gin Gly Ser Leu Lys Lys Val Ala Gin Gin Phe Leu 
415 420 425 430 

ctg atg tgg tct tgc ttt ggt acg agg gtg ate egg gac atg acc ttg 164 4 

Leu Met Trp Ser Cys Phe Gly Thr Arg Val lie Arg Asp Met Thr Leu 
435 440 445 

cac agt gee ccc age ttc ggg tct ttt cac ctg att cae ctg atg ttc 1692 
His Ser Ala Pro Ser Phe Gly Ser Phe His Leu lie His Leu Met Phe 
450 455 460 

gac gac tac gtg etc tac ttg eta gaa tct ctg cat tgt cag gag egg 17 4 0' 

Asp Asp Tyr Val Leu Tyr Leu Leu Glu Ser Leu His Cys Gin Glu Arg 
465 470 475 

gec aac gag etc atg cga gee atg aaa gga gaa gga age act gca gaa 1788 
Ala Asn Glu Leu Met Arg Ala Met Lys Gly Glu Gly Ser Thr Ala Glu 
480 485 490 

gec cag gaa gag att ate ttg aca gag get acc cca cca acc cct tea 1836 
Ala Gin Glu Glu lie lie Leu Thr Glu Ala Thr Pro Pro Thr Pro Ser 
495 500 505 510 

cct ggt cca tea ttt tct cca gca aag tct gee aca tct gtg gag gtg 1884 
Pro Gly Pro Ser Phe Ser Pro Ala Lys Ser Ala Thr Ser Val Glu Val 
515 520 525 

cca cct ccc tec tec cct gtc age aac cca tec ccc gaa tac act ggc 1932 
Pro Pro Pro Ser Ser Pro Val Ser Asn Pro Ser Pro Glu Tyr Thr Gly 
530 535 540 

ctt age aca gca gga gcg atg cag tea tat acg tgg teg eta aca tat 1980 
Leu Ser Thr Ala Gly Ala Met Gin Ser Tyr Thr Trp Ser Leu Thr Tyr 
545 550 555 

aca gta aca acg get gca ggg tea ccg get gag aac tec caa caa eta 2028 
Thr Val Thr Thr Ala Ala Gly Ser Pro Ala Glu Asn Ser Gin Gin Leu 
560 565 570 

ccc tgt atg agg age acc cat atg cct tct tec tec gtc aca cac agg 207 6 

Pro Cys Met Arg Ser Thr His Met Pro Ser Ser Ser Val Thr His Arg 
575 580 585 590 

ata cca gtc tac tec cac aga gag gag cat ggg tac acg gga age tat 212 4 

lie Pro Val Tyr Ser His Arg Glu Glu His Gly Tyr Thr Gly Ser Tyr 
595 600 605 
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aac tac ggg age tat ggc aac cag cat cct cac cca ctg cag aac cag 2172 
Asn Tyr Gly Ser Tyr Gly Asn Gin His Pro His Pro Leu Gin Asn Gin 
610 " 615 620 



tat cca gcc ttg cct cat gac aca gcc ate tct ggg cct etc cac tat 
Tyr Pro Ala Leu Pro His Asp Thr Ala lie Ser Gly Pro Leu His Tyr 
625 630 635 



acc cca gtg act ccc cga tgg cca gag gtg ccg act gcc aac^gca tgc 
Thr Pro Val Thr Pro Arg Trp Pro Glu Val Pr.o Thr Ala Asn 'Ala Cys 
675 680 685 



atg tac acc ccg ctg acc acg cgc agg aat tct gag tat gag cac atg 
Met Tyr Thr Pro Leu Thr Thr Arg Arg Asn Ser Glu Tyr Glu His Met 
705 710 715 



tgg get aag tga ctgetttcat agaaatccat atttaatatt aataattaat 

Trp Ala Lys 

735 



<210> 6 

<211> 737 

<212> PRT 

<213> Mus musculus 

<400> 6 

Met His Cys Gly Leu Leu Glu Glu Pro Asp Met Asp Ser Thr Glu Ser 
15 10 15 



Trp lie Glu Arg Cys Leu Asn Glu Ser Glu Asn Lys Arg Tyr Ser Ser 
20 ~ 25 30 



2220 



tec cct tac cac agg age tct gcc cag tac cct ttc aat age ccc act 22 68 

Ser Pro Tyr His Arg Ser Ser Ala Gin Tyr Pro Phe Asn Ser Pro Thr 
640 645 650 

tec agg atg gaa cct tgt ttg atg age agt act ccc agg ctg cat cct 2316 
Ser Arg Met Glu Pro Cys Leu Met Ser Ser Thr Pro Arg Leu His Pro 
655 = 660 665 670 



2364 



tac aca age cca tct gtg cat tec acg agg tat gga aac tct agt gac 2412 
Tyr Thr Ser Pro Ser Val His Ser Thr Arg Tyr Gly Asn Ser Ser Asp 
690 695 700 



2460 



caa cac ttt cct ggc ttt get tac ate aac gga gag gcc tec act gga 2508 
Gin His Phe Pro Gly Phe Ala Tyr lie Asn Gly Glu Ala Ser Thr Gly 
720 725 730 



2560 



aataataata 


aacccagtac 


ccaccctcca 


gaagacttta 


tttcaataca 


tcataacttg 


2620 


cgggctgacc 


taagcatcca 


ttctcctaat 


gaacaagagg 


atgttcaatg tggagtgaat 


2680 


agactttagt 


tcagaaacag 


gagtcactaa 


aagtcagtgg 


gattgggttt 


ctgtagccaa 


2740 


gecagacttg 


actgtttcta 


tagagcacta 


tcttgggcag 


gecaatctgt 


gcctttcccc 


2800 


tctgttccat 


gaecttgeat 


ggcaactact 


ccttgtatag 


gg 




2842 
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His Thr Ser Leu Gly Asn Val Ser Asn Asp Glu Asn Glu Glu Lys Glu 
35 40 45 



Asn Asn Arg Ala Ser Lys Pro His Ser Thr Pro Ala Thr Leu Gin Trp 
50 " 55 60 



Leu Glu Glu Asn Tyr Glu lie Ala Glu Gly Val Cys lie Pro Arg Ser 
65 70 75 80 



Ala Leu Tyr Met His Tyr Leu Asp Phe Cys Glu Lys Asn Asp Thr Gin 
85 90 95 



Pro Val Asn Ala Ala Ser Phe Gly Lys lie lie Arg Gin Gin Phe Pro 
100 105 110 



Gin Leu Thr Thr Arg Arg Leu Gly Thr Gly Thr Arg Gly Gin Ser Lys 
115 120 125 



Tyr His Tyr Tyr Gly lie Ala Val Lys Glu Ser Ser Gin Tyr Tyr Asp 
130 135 140 



Val Met Tyr Ser Lys Lys Gly Ala Ala Trp Val Ser Glu Thr Gly Lys 
145 150 155 160 



Arg Glu Val Thr Lys Gin Thr Val Ala Tyr Ser Pro Arg Ser Lys Leu 
165 170 175 



Gly Thr Leu Leu Pro Asp Phe Pro Asn Val Lys Asp Leu Asn Leu Pro 
180 185 190 



Ala Ser Leu Pro Glu Glu Lys Val Ser Thr Phe lie Met Met Tyr Arg 
195 200 205 



Thr His Cys Gin Arg lie Leu Asp Thr Val lie Arg Ala Asn Phe Asp 
210 215 220 



Glu Val Gin Ser Phe Leu Leu His Phe Trp Gin Gly Met Pro Pro His 
225 230 235 240 



Met Leu Pro Val Leu Gly Ser Ser Thr Val Val Asn lie Val Gly Val 
245 250 255 



Cys Asp Ser lie Leu Tyr Lys Ala lie Ser Gly Val Leu Met Pro Thr 
260 265 270 
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Val Leu Gin Ala Leu Pro Asp Ser Leu Thr Gin Val lie Arg Lys Phe 
275 280 285 



Ala Lys Gin Leu Asp Glu Trp Leu Lys Val Ala Leu His Asp Leu Pro 
290 295 300 



Glu Asn Leu Arg Asn lie Lys Phe Glu Leu Ser Arg Arg Phe Ser Gin 
305 310 315 320 



lie Leu Arg Arg Gin Thr Ser Leu Asn His Leu Cys Gin Ala Ser Arg 
325 330 335 



Thr Val lie His Ser Ala Asp lie Thr Phe Gin Met Leu Glu Asp Trp 
340 345 350 



Arg Asn Val Asp Leu Ser Ser He Thr Lys Gin Thr Leu Tyr Thr Met 
355 360 365 



Glu Asp Ser Arg Asp Glu His Arg Arg Leu He He Gin Leu Tyr Gin 
370 " " 375 380 



Glu Phe Asp His Leu Leu Glu Glu Gin Ser Pro He Glu Ser Tyr He 
385 390 395 400 



Glu Trp Leu Asp Thr Met Val Asp Arg Cys Val Val Lys Val Ala Ala 
405 410 415 



Lys Arg Gin Gly Ser Leu Lys Lys Val Ala Gin Gin Phe Leu Leu Met 
420 425 430 



Trp Ser Cys Phe Gly Thr Arg Val He Arg Asp Met Thr Leu His Ser 
435 440 445 



Ala Pro Ser Phe Gly Ser Phe His Leu He His Leu Met Phe Asp Asp 
450 455 460 



Tyr Val Leu Tyr Leu Leu Glu Ser Leu His Cys Gin Glu Arg Ala Asn 
465 470 475 480 



Glu Leu Met Arg Ala Met Lys Gly Glu Gly Ser Thr Ala Glu Ala Gin 
485 490 495 



Glu Glu He He Leu Thr Glu Ala Thr Pro Pro Thr Pro Ser Pro Gly 
500 505 510 
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Pro Ser Phe Ser Pro Ala Lys Ser Ala Thr Ser Val Glu Val Pro Pro 
515 520 525 



Pro Ser Ser Pro Val Ser Asn Pro Ser Pro Glu Tyr Thr Gly Leu Ser 
530 535 540 



Thr Ala Gly Ala Met Gin Ser Tyr Thr Trp Ser Leu Thr Tyr Thr Val 
545 550 555 560 



Thr Thr Ala Ala Gly Ser Pro Ala Glu Asn Ser Gin Gin Leu Pro Cys 
565 570 575 



Met Arg Ser Thr His Met Pro Ser Ser Ser Val Thr His Arg lie Pro 
580 585 590 



Val Tyr Ser His Arg Glu Glu His Gly Tyr Thr Gly Ser Tyr Asn Tyr 
595 600 605 



Gly Ser Tyr Gly Asn Gin His Pro His Pro Leu Gin Asn Gin Tyr Pro 
610 615 620 



Ala Leu Pro His Asp Thr Ala lie Ser Gly Pro Leu His Tyr Ser Pro 
625 630 635 640 



Tyr His Arg Ser Ser Ala Gin Tyr Pro Phe Asn Ser Pro Thr Ser Arg 
645 650 655 



Met Glu Pro Cys Leu Met Ser Ser Thr Pro Arg Leu His Pro Thr Pro 
660 665 670 



Val Thr Pro Arg Trp Pro Glu Val Pro Thr Ala Asn Ala Cys Tyr Thr 
675 680 685 



Ser Pro Ser Val His Ser Thr Arg Tyr Gly Asn Ser Ser Asp Met Tyr 
690 695 700 



Thr Pro Leu Thr Thr Arg Arg Asn Ser Glu Tyr Glu His Met Gin His 
705 710 715 720 



Phe Pro Gly Phe Ala Tyr lie Asn Gly Glu Ala Ser Thr Gly Trp Ala 
725 730 735 



Lys 



<210> 7 
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<211> 3603 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> CDS 

<222> (68) . . (2275) 

<400> 7 

ctctagcaca ggggatcccc aaacatcagg acttttgggg ggcgcctgtg ctgtccatgg 60 

gaagagc atg cat tgt ggg tta ctg gag gaa ccc gac atg gat tec aca 109 
Met His Cys Gly Leu Leu Glu Glu Pro Asp Met Asp Ser Thr 
15 10 

gag age tgg att gaa aga tgt etc aac gaa agt gaa aac aaa cgt tat 157 
Glu Ser Trp lie Glu Arg Cys Leu Asn Glu Ser Glu Asn Lys Arg Tyr 
15 20 25 30 

tec age cac aca tct ctg ggg aat gtt tct aat gat gaa aat gag gaa 205 
Ser Ser His Thr Ser Leu Gly Asn Val Ser Asn Asp Glu Asn Glu Glu 
35 40 45 

aaa gaa aat aat aga gca tec aag ccc cac tec act cct get act ctg 253 
Lys Glu Asn Asn Arg Ala Ser Lys Pro His Ser Thr Pro Ala Thr Leu 
50 55 60 

caa tgg ctg gag gag aac tat gag att gca gag ggg gtc tgc ate cct 301 
Gin Trp Leu Glu Glu Asn Tyr Glu lie Ala Glu Gly Val Cys lie Pro 
65 70 75 

cgc agt gee etc tat atg cat tac ctg gat ttc tgc gag aag aat gat 34 9 

Arg Ser Ala Leu Tyr Met His Tyr Leu Asp Phe Cys Glu Lys Asn Asp 
80 85 90 

ace caa cct gtc aat get gee age ttt gga aag ate ata agg cag cag 397 
Thr Gin Pro Val Asn Ala Ala Ser Phe Gly Lys lie lie Arg Gin Gin 
95 100 105 110 

ttt cct cag tta ace ace aga aga etc ggg ace cga gga cag tea aag 445 
Phe Pro Gin Leu Thr Thr Arg Arg Leu Gly Thr Arg Gly Gin Ser Lys 
115 120 125 

tac cat tac tat ggc att gca gtg aaa gaa age tec caa tat tat gat 4 93 

Tyr His Tyr Tyr Gly lie Ala Val Lys Glu Ser Ser Gin Tyr Tyr Asp 
130 135 140 

gtg atg tat tec aag aaa gga get gee tgg gtg agt gag acg ggc aag 541 
Val Met Tyr Ser Lys Lys Gly Ala Ala Trp Val Ser Glu Thr Gly Lys 
145 150 155 

aaa gaa gtg age aaa cag aca gtg gca tat tea ccc egg tec aaa etc 589 
Lys Glu Val Ser Lys Gin Thr Val Ala Tyr Ser Pro Arg Ser Lys Leu 
160 165 170 

gga aca ctg ctg cca gaa ttt ccc aat gtc aaa gat eta aat ctg cca 637 
Gly Thr Leu Leu Pro Glu Phe Pro Asn Val Lys Asp Leu Asn Leu Pro 
175 180 185 190 

gee age ctg cct gag gag aag gtt tct ace ttt att atg atg tac aga 685 
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Ala Ser Leu Pro Glu Glu Lys Val Ser Thr Phe lie Met Met Tyr Arg 
195 200 205 

aca cac tgt cag aga ata ctg gac act gta ata aga gcc aac ttt gat 733 
Thr His Cys Gin Arg lie Leu Asp Thr Val lie Arg Ala Asn Phe Asp 
210 215 220 

gag gtt caa agt ttc ctt ctg cac ttt tgg caa gga atg ccg ccc cac 781 
Glu Val Gin Ser Phe Leu Leu His Phe Trp Gin Gly Met Pro Pro His 
225 230 235 

atg ctg cct gtg ctg ggc tec tec acg gtg gtg aac att gtc ggc gtg 829 
Met Leu Pro Val Leu Gly Ser Ser Thr Val Val Asn lie Val Gly Val 
240 245 250 

tgt gac tec ate etc tac aaa get ate tec ggg gtg ctg atg ccc act 87 7 

Cys Asp Ser lie Leu Tyr Lys Ala lie Ser Gly Val Leu Met Pro Thr 
255 * 260 265 270 

gtg ctg cag gca tta cct gac age tta act cag gtg att cga aag ttt 925 
Val Leu Gin Ala Leu Pro Asp Ser Leu Thr Gin Val lie Arg Lys Phe 
275 280 285 

gcc aag caa ctg gat gag tgg eta aaa gtg get etc cac gac etc cca 973 
Ala Lys Gin Leu Asp Glu Trp Leu Lys Val Ala Leu His Asp Leu Pro 
290 295 300 

gaa aac ttg cga aac ate aag ttc gaa ttg teg aga agg ttc tec caa 1021 
Glu Asn Leu Arg Asn lie Lys Phe Glu Leu Ser Arg Arg Phe Ser Gin 
305 310 315 

att ctg aga egg caa aca tea eta aat cat etc tgc cag gca tct cga 1069 
lie Leu Arg Arg Gin Thr Ser Leu Asn His Leu Cys Gin Ala Ser Arg 
320 325 330 

aca gtg ate cac agt gca gac ate acg ttc caa atg ctg gaa gac tgg 1117 
Thr Val lie His Ser Ala Asp lie Thr Phe Gin Met Leu Glu Asp Trp 
335 340 345 350 

agg aac gtg gac ctg aac age ate ace aag caa ace ctt tac ace atg 1165 
Arg Asn Val Asp Leu Asn Ser lie Thr Lys Gin Thr Leu Tyr Thr Met 
355 360 365 

gaa gac tct cgc gat gag cac egg aaa etc ate ace caa tta tat cag 1213 
Glu Asp Ser Arg Asp Glu His Arg Lys Leu lie Thr Gin Leu Tyr Gin 
370 375 380 

gag ttt gac cat etc ttg gag gag cag tct ccc ate gag tec tac att 1261 
Glu Phe Asp His Leu Leu Glu Glu Gin Ser Pro lie Glu Ser Tyr lie 
385 390 395 

gag tgg ctg gat ace atg gtt gac cgc tgt gtt gtg aag gtg get gcc 1309 
Glu Trp Leu Asp Thr Met Val Asp Arg Cys Val Val Lys Val Ala Ala 
400 405 410 

aag aga caa ggg tec ttg aag aaa gtg gcc cag cag ttc etc ttg atg 1357 
Lys Arg Gin Gly Ser Leu Lys Lys Val Ala Gin Gin Phe Leu Leu Met 
415 420 425 430 

tgg tec tgt ttc ggc aca agg gtg ate egg gac atg ace ttg cac age 1405 
Trp Ser Cys Phe Gly Thr Arg Val lie Arg Asp Met Thr Leu His Ser 
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435 440 445 

gcc ccc age ttc ggg tct ttt cac eta att cac tta atg ttt gat gac 14 53 

Ala Pro Ser Phe Gly Ser Phe His Leu lie His Leu Met Phe Asp Asp 
450 455 460 

tac gtg etc tac ctg tta gaa tct ctg cac tgt cag gag egg gcc aat 1501 
Tyr Val Leu Tyr Leu Leu Glu Ser Leu His Cys Gin Glu Arg Ala Asn 
465 * 470 475 

gag etc atg cga gcc atg aag gga gaa gga age act gca gaa gtc cga 154 9 

Glu Leu Met Arg Ala Met Lys Gly Glu Gly Ser Thr Ala Glu Val Arg 
480 485 490 

gaa gag ate ate ttg aca gag get gcc gca cca ace cct tea cca gtg 1597 
Glu Glu lie lie Leu Thr Glu Ala Ala Ala Pro Thr Pro Ser Pro Val 
495 500 505 510 



cca teg ttt tct cca gca aaa tct gcc aca tct atg gaa gtg cca cct 
Pro Ser Phe Ser Pro Ala Lys Ser Ala Thr Ser Met Glu Val Pro Pro 
515 520 525 



1645 



ccc tct tec cct gtt age aat cct tec cct gag tac act ggc etc age 1693 
Pro Ser Ser Pro Val Ser Asn Pro Ser Pro Glu Tyr Thr Gly Leu Ser 
530 535 540 

act aca gga gca atg cag tct tac acg tgg tct eta aca tac aca gtg 1741 
Thr Thr Gly Ala Met Gin Ser Tyr Thr Trp Ser Leu Thr Tyr Thr Val 
545 550 555 

acg acg get get ggg tec cca get gag aac tec caa cag ctg ccc tgt 178 9 

Thr Thr Ala Ala Gly Ser Pro Ala Glu Asn Ser Gin Gin Leu Pro Cys 
560 565 570 

atg agg aac act cat gtg cct tct tec tec gtc aca cac agg ata cca 1837 
Met Arg Asn Thr His Val Pro Ser Ser Ser Val Thr His Arg lie Pro 
575 " 580 585 590 

gtt tat ccc cac aga gag gaa cat gga tac acg gga age tat aac tat 1885 
Val Tyr Pro His Arg Glu Glu His Gly Tyr Thr Gly Ser Tyr Asn Tyr 
595 600 605 

ggg age tat ggc aac cag cat cct cac ccc atg cag age cag tat ccg 1933 
Gly Ser Tyr Gly Asn Gin His Pro His Pro Met Gin Ser Gin Tyr Pro 
610 615 620 

gcc etc cct cat gac aca get ate tct ggg cca etc cac tat gcc cct 1981 
Ala Leu Pro His Asp Thr Ala lie Ser Gly Pro Leu His Tyr Ala Pro 
625 630 635 

tac cac agg age tct gca cag tac cct ttt aat age ccc act tec egg 2029 
Tyr His Arg Ser Ser Ala Gin Tyr Pro Phe Asn Ser Pro Thr Ser Arg 
640 64 5 650 

atg gaa cct tgt ttg atg age agt act ccc aga ctg cat cct ace cca 2077 
Met Glu Pro Cys Leu Met Ser Ser Thr Pro Arg Leu His Pro Thr Pro 
655 660 665 670 

gtc act ccc cgc tgg cca gag gtg ccc tea gcc aac acg tgc tac aca 2125 
Val Thr Pro Arg Trp Pro Glu Val Pro Ser Ala Asn Thr Cys Tyr Thr 
675 680 685 
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age ccg tct gtg cat tct gcg agg tac gga aac tct agt gac atg tat 
Ser Pro Ser Val His Ser Ala Arg Tyr Gly Asn Ser Ser Asp Met Tyr 
690 695 700 



2173 



aca cct ctg aca acg cgc agg aat tct gaa tat gag cac atg caa cac 
Thr Pro Leu Thr Thr Arg Arg Asn Ser Glu Tyr Glu His Met Gin His 
705 710 715 



2221 



ttt cct ggc ttt get tac ate aac gga gag gec tct aca gga tgg get 
Phe Pro Gly Phe Ala Tyr lie Asn Gly Glu Ala Ser Thr Gly Trp Ala 
720 725 730 



2269 



aaa tga ctgetatcat aggcatccat atttaatatt aataataata attaataata 

Lys 

735 



2325 



ataataaacc caacacccat cccccagaag actttatctc tatacattgt aactcatggg 2385 

ctattcctaa gtgcccattt tcctaatgaa catgaggatg ggatcaatgt gggatgaata 2445 

aactttagtt cagaaacagg acttactaaa agtcagtggg actgggtttc tgtagccaag 2505 

ccagacttga ctgtttctgt agagcactat ctegggcagg ccattctgtg ccttttccct 25 65 

ctgttccatg actttgettt gtgttggcaa ccacttctag taagctactg attttcctgt 2 625 

tgacaaaatc tctttagtct tgaaggatgg atactggaga cagaatctgg tttgtgttct 2 685 

tggatgggca cataatttac caagagcatt caccttgcca tctgtcttgt cattgtactg 2745 

tacaaggaac agccctcaga cgtgttctgc acatcccttc ttcctggtgg taccatccct 2805 

atttcctgga gcaccagggc taaatgggga gctatctgga aactctagat tttctgtcat 2865 

acccacatct gtcacagtac ctgcattgtc ttggaatgta agcactgtct tgagggaagg 2 925 

aagaggtctg ttctgtattg ccttaagttg attgaggttt gtaggagact ggttcttcta 2985 

catacaagga tttgtcttaa gtttgcacaa tggctagtgt cagcaaaagg caggagaggg 3045 

tttttgtttt ttttttaagt tctatgagaa tgtggattta tggcattgag tatcacactc 3105 

agetctgetg tgttaacttt gtgaaactgg atggaacaaa ctttaacttaoccaagcacca 3165 

agtgtgaaag tgactttcac ggttccttca taaaactata ataatatccg acactttgat 3225 

agaaaaaaat tcaaagctgt gectttgage ctatactata ctgtgtatgt gtggaaataa 3285 

aaatgtattg tacttttgga gaattttttg taggcatttt tctgtcagat ttgtagtaat 3345 

ttgtgaggtt tgttagagat taatataggt tttctttctg tattataaaa tgcaccaagc 3405 

aattatggtg gacctattac cctatgggta agaaataaat ggaaatatga categgatgt 34 65 

ttcagcaact gttctgtaaa taaaatcttt gatcacacca ctcagtgtga taattgtgtc 3525 

tacagctaaa atggaaatag ttttatctgt acagttgtgc aagatatgaa tggtttcaca 3585 

ctcaaataaa aaatattg 3603 
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<210> 8 

<211> 735 

<212> PRT 

<213> Homo sapiens 

<400> 8 

Met His Cys Gly Leu Leu Glu Glu Pro Asp Met Asp Ser Thr Glu Ser 
15 10 15 



Trp He Glu Arg Cys Leu Asn Glu Ser Glu Asn Lys Arg Tyr Ser Ser 
20 ~ 25 30 



His Thr Ser Leu Gly Asn Val Ser Asn Asp Glu Asn Glu Glu Lys Glu 
35 40 45 



Asn Asn Arg Ala Ser Lys Pro His Ser Thr Pro Ala Thr Leu Gin Trp 
50 " 55 60 



Leu Glu Glu Asn Tyr Glu lie Ala Glu Gly Val Cys He Pro Arg Ser 
65 70 75 80 



Ala Leu Tyr Met His Tyr Leu Asp Phe Cys Glu Lys Asn Asp Thr Gin 
85 90 95 



Pro Val Asn Ala Ala Ser Phe Gly Lys lie He Arg Gin Gin Phe Pro 
100 105 110 



Gin Leu Thr Thr Arg Arg Leu Gly Thr Arg Gly Gin Ser Lys Tyr His 
115 120 125 



Tyr Tyr Gly He Ala Val Lys Glu Ser Ser Gin Tyr Tyr Asp Val Met 
130 135 140 



Tyr Ser Lys Lys Gly Ala Ala Trp Val Ser Glu Thr Gly Lys Lys Glu 
145 150 155 160 



Val Ser Lys Gin Thr Val Ala Tyr Ser Pro Arg Ser Lys Leu Gly Thr 
165 170 175 



Leu Leu Pro Glu Phe Pro Asn Val Lys Asp Leu Asn Leu Pro Ala Ser 
180 185 190 



Leu Pro Glu Glu Lys Val Ser Thr Phe He Met Met Tyr Arg Thr His 
195 200 205 



Cys Gin Arg He Leu Asp Thr Val He Arg Ala Asn Phe Asp Glu Val 
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210 215 220 

Gin Ser Phe Leu Leu His Phe Trp Gin Gly Met Pro Pro His Met Leu 
225 230 235 240 

Pro Val Leu Gly Ser Ser Thr Val Val Asn lie Val Gly Val Cys Asp 
245 250 255 



Ser lie Leu Tyr Lys Ala lie Ser Gly Val Leu Met Pro Thr Val Leu 
260 "* 265 270 



Gin Ala Leu Pro Asp Ser Leu Thr Gin Val lie Arg Lys Phe Ala Lys 
275 280 285 



Gin Leu Asp Glu Trp Leu Lys Val Ala Leu His Asp Leu Pro Glu Asn 
290 295 300 



Leu Arg Asn lie Lys Phe Glu Leu Ser Arg Arg Phe Ser Gin lie Leu 
305 310 315 320 



Arg Arg Gin Thr Ser Leu Asn His Leu Cys Gin Ala Ser Arg Thr Val 
325 330 335 



lie His Ser Ala Asp lie Thr Phe Gin Met Leu Glu Asp Trp Arg Asn 
340 345 350 



Val Asp Leu Asn Ser lie Thr Lys Gin Thr Leu Tyr Thr Met Glu Asp 
355 360 365 



Ser Arg Asp Glu His Arg Lys Leu lie Thr Gin Leu Tyr Gin Glu Phe 
370 375 380 



Asp His Leu Leu Glu Glu Gin Ser Pro lie Glu Ser Tyr lie Glu Trp 
385 390 395 400 



Leu Asp Thr Met Val Asp Arg Cys Val Val Lys Val Ala Ala Lys Arg 
405 410 415 



Gin Gly Ser Leu Lys Lys Val Ala Gin Gin Phe Leu Leu Met Trp Ser 
420 425 430 



Cys Phe Gly Thr Arg Val lie Arg Asp Met Thr Leu His Ser Ala Pro 
435 440 445 



Ser Phe Gly Ser Phe His Leu He His Leu Met Phe Asp Asp Tyr Val 
450 455 460 
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Leu Tyr Leu Leu Glu Ser Leu His Cys Gin Glu Arg Ala Asn Glu Leu 
465 470 475 480 



Met Arg Ala Met Lys Gly Glu Gly Ser Thr Ala Glu Val Arg Glu Glu 
485 490 495 



lie lie Leu Thr Glu Ala Ala Ala Pro Thr Pro Ser Pro Val Pro Ser 
500 505 510 



Phe Ser Pro Ala Lys Ser Ala Thr Ser Met Glu Val Pro Pro Pro Ser 
515 520 525 



Ser Pro Val Ser Asn Pro Ser Pro Glu Tyr Thr Gly Leu Ser Thr Thr 
530 535 540 



Gly Ala Met Gin Ser Tyr Thr Trp Ser Leu Thr Tyr Thr Val Thr Thr 
545 550 555 560 



Ala Ala Gly Ser Pro Ala Glu Asn Ser Gin Gin Leu Pro Cys Met Arg 
565 570 575 



Asn Thr His Val Pro Ser Ser Ser Val Thr His Arg lie Pro Val Tyr 
580 585 590 



Pro His Arg Glu Glu His Gly Tyr Thr Gly Ser Tyr Asn Tyr Gly Ser 
595 600 605 



Tyr Gly Asn Gin His Pro His Pro Met Gin Ser Gin Tyr Pro Ala Leu 
610 615 620 



Pro His Asp Thr Ala lie Ser Gly Pro Leu His Tyr Ala Pro Tyr His 
625 630 635 640 



Arg Ser Ser Ala Gin Tyr Pro Phe Asn Ser Pro Thr Ser Arg Met Glu 
645 650 655 



Pro Cys Leu Met Ser Ser Thr Pro Arg Leu His Pro Thr Pro Val Thr 
660 665 670 



Pro Arg Trp Pro Glu Val Pro Ser Ala Asn Thr Cys Tyr Thr Ser Pro 
675 680 685 



Ser Val His Ser Ala Arg Tyr Gly Asn Ser Ser Asp Met Tyr Thr Pro 
690 695 700 
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Leu Thr Thr Arg Arg Asn Ser Glu Tyr Glu His Met Gin His Phe Pro 
705 710 715 720 



Gly Phe Ala Tyr lie Asn Gly Glu Ala Ser Thr Gly Trp Ala Lys 
725 730 735 



<210> 9 

<211> 3003 

<212> DNA 

<213> Danio rerio 



<220> 
<221> 
<222> 



CDS 

(89) . . (2296) 



<400> 9 

ccacgcgtcc gcggagggaa ctgaggaggg ggagtcctgc acggagccat tttctcagag 

gagcccctgg aacgtgcatg ggaagagg atg ctt tgt ggg ctg ctg gaa gag 

Met Leu Cys Gly Leu Leu Glu Glu 
1 5 



60 
112 



cct gac atg gat tec aca gag age tgg att gaa aga tgt ctg aac gaa 
Pro Asp Met Asp Ser Thr Glu Ser Trp lie Glu Arg Cys Leu Asn Glu 
10 15 20 



160 



age gag age aag cgc ttc tec age cac tct tct att gga aat att tec 
Ser Glu Ser Lys Arg Phe Ser Ser His Ser Ser lie Gly Asn lie Ser 
25 30 35 40 



208 



aac gac gaa aac gaa gag aag gaa aat aac cga gca tct aag cca cat 
Asn Asp Glu Asn Glu Glu Lys Glu Asn Asn Arg Ala Ser Lys Pro His 
45 50 55 



256 



tea aca cct get aca tta caa tgg ttg gag gag aac tac gag ate gca 
Ser Thr Pro Ala Thr Leu Gin Trp Leu Glu Glu Asn Tyr Glu lie Ala 
60 65 70 



304 



gag ggt gtg tgt att cct cgc ate gec ctg tac atg cac tac ctg gac 
Glu Gly Val Cys lie Pro Arg lie Ala Leu Tyr Met His Tyr Leu Asp 
75 80 85 



352 



ttc tgc gaa aaa ctg gac tea cag cca gtc aat get gca age ttc gga 
Phe Cys Glu Lys Leu Asp Ser Gin Pro Val Asn Ala Ala Ser Phe Gly 
90 ~ 95 100 



400 



aag ata ata agg cag cag ttt cct cag ttg acc acg egg aga tta gga 
Lys He He Arg Gin Gin Phe Pro Gin Leu Thr Thr Arg Arg Leu Gly 
105 110 115 120 



448 



act aga ggt caa tea aag tat cat tac tat ggc ate gca gtg aag gag 
Thr Arg Gly Gin Ser Lys Tyr His Tyr Tyr Gly He Ala Val Lys Glu 
125 130 135 



496 



age tec cag tac tac gat gtg atg tac tct aaa aag ggc get gcg tgg 
Ser Ser Gin Tyr Tyr Asp Val Met Tyr Ser Lys Lys Gly Ala Ala Trp 
140 145 150 



544 
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gtg aac gag acg ggc aag aaa gag gtc acc aaa cag aca gta gcg tat 592 
Val Asn Glu Thr Gly Lys Lys Glu Val Thr Lys Gin Thr Val Ala Tyr 
155 160 165 

tea ccg cgc tec aag ctg ggc act etc ctg cca gac ttt cca aat gtc 64 0 

Ser Pro Arg Ser Lys Leu Gly Thr Leu Leu Pro Asp Phe Pro Asn Val 
170 = 175 180 

aaa gac eta aat ctg ccc gee agt ctg cca gag gag aag gtc teg acc 688 
Lys Asp Leu Asn Leu Pro Ala Ser Leu Pro Glu Glu Lys Val Ser Thr 
185 ~ 190 195 200 

ttt att atg atg tac aga act cac tgc cag agg ata ctg gat act gtc 736 
Phe lie Met Met Tyr Arg Thr His Cys Gin Arg lie Leu Asp Thr Val 
205 210 215 

ata cgc gee aac ttc gat gag gtt cag age ttc ctg ttg cac ttt tgg 78 4 

lie Arg Ala Asn Phe Asp Glu Val Gin Ser Phe Leu Leu His Phe Trp 
220 225 230 

cag ggc atg ccg ccc cac atg etc cct gtc ctg ggc tct tct aca gtg 832 
Gin Gly Met Pro Pro His Met Leu Pro Val Leu Gly Ser Ser Thr Val 
235 240 245 

gtc aac ata gtg ggt gtg tgt gac tec ata ttg tac aag gee ate tea 880 
Val Asn lie Val Gly Val Cys Asp Ser lie Leu Tyr Lys Ala He Ser 
250 255 260 

ggc gtc etc atg ccc acc gtc eta caa get ctg cct gac age etc act 928 
Gly Val Leu Met Pro Thr Val Leu Gin Ala Leu Pro Asp Ser Leu Thr 
265 270 275 280 

cag gtg ate agg aag ttt gec aag cag ctg gac gag tgg ctg aag gtg 97 6 

Gin Val He Arg Lys Phe Ala Lys Gin Leu Asp Glu Trp Leu Lys Val 
285 290 295 

get tta cat gac ctg ccc gaa aac ctg cgc aac att aag ttt gaa ttg 1024 
Ala Leu His Asp Leu Pro Glu Asn Leu Arg Asn He Lys Phe Glu Leu 
300 305 310 

tea aga aga ttt tct cag att etc aaa cga caa aca tea tta aac cac 1072 
Ser Arg Arg Phe Ser Gin He Leu Lys Arg Gin Thr Ser Leu Asn His 
315 320 325 

etc tgt cag gee tct cga aca gtg ate cac agt gca gac ate acc ttt 1120 
Leu Cys Gin Ala Ser Arg Thr Val He His Ser Ala Asp He Thr Phe 
330 335 340 

cag atg etc gag gac tgg agg aac gta gac etc aac age ate act aaa 1168 
Gin Met Leu Glu Asp Trp Arg Asn Val Asp Leu Asn Ser He Thr Lys 
345 350 355 360 

caa act ctt tat act atg gaa gac tec aga gaa gac cag agg aga etc 1216 
Gin Thr Leu Tyr Thr Met Glu Asp Ser Arg Glu Asp Gin Arg Arg Leu 
365 370 375 

ate ate caa ttg tat caa gaa ttt gac aga ctg eta gag gac cag tct 1264 
He He Gin Leu Tyr Gin Glu Phe Asp Arg Leu Leu Glu Asp Gin Ser 
380 385 390 
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cca att gaa gcc tac ate gag tgg ctg gac tct atg gtg gag aga tgt 1312 
Pro lie Glu Ala Tyr lie Glu Trp Leu Asp Ser Met Val Glu Arg Cys 
395 400 405 

gtt gtg agg gtg gcg ggg aag aga ccc gga tct ctg aag agg gta get 1360 
Val Val Arg Val Ala Gly Lys Arg Pro Gly Ser Leu Lys Arg Val Ala 
410 415 420 

cag cag ttc ctg etc atg tgg teg tgt ttt ggg aca aga gtt ate egg 1408 
Gin Gin Phe Leu Leu Met Trp Ser Cys Phe Gly Thr Arg Val lie Arg 
425 430 435 440 

gat atg acg ctg cat agt gca cca age ttt ggc teg ttc cat ctg att 1456 
Asp Met Thr Leu His Ser Ala Pro Ser Phe Gly Ser Phe His Leu lie 
445 450 455 

cac etc atg ttt gat gac tat gta ctt tac ctg ctt gaa tct ctg cac 1504 
His Leu Met Phe Asp Asp Tyr Val Leu Tyr Leu Leu Glu Ser Leu His 
4 60 4 65 470 

tgc caa gag aga gcc aat gaa ctg atg agg gcg atg aaa gga gag ggc 1552 
Cys Gin Glu Arg Ala Asn Glu Leu Met Arg Ala Met Lys Gly Glu Gly 
475 480 485 

gca cca gca gat act gga gaa gag ctg atg ctg atg age tec act cca 1600 
Ala Pro Ala Asp Thr Gly Glu Glu Leu Met Leu Met Ser Ser Thr Pro 
490 495 500 

aca tct acg tea cct gga ccc tac tct cct gcc aaa tct gtt cac teg 164 8 

Thr Ser Thr Ser Pro Gly Pro Tyr Ser Pro Ala Lys Ser Val His Ser 
505 510 515 520 

gtg ggc gta ccc gca gta ggg tec ccc aat tea gcc cag tct ccg gag 1696 
Val Gly Val Pro Ala Val Gly Ser Pro Asn Ser Ala Gin Ser Pro Glu 
525 530 535 

tac ace age ata teg gcc aca aca gga get gtt cag tea tat acc tgg 174 4 

Tyr Thr Ser He Ser Ala Thr Thr Gly Ala Val Gin Ser Tyr Thr Trp 
540 545 550 

tec ctt aca tac aca gtg aca act tea ggc ggc age cca acc gag ccc 1792 
Ser Leu Thr Tyr Thr Val Thr Thr Ser Gly Gly Ser Pro Thr Glu Pro 
555 " 560 565 

gga tec cag ctg tec tgc atg aga ggc gga cct gcg tta cac gga tea 1840 
Gly Ser Gin Leu Ser Cys Met Arg Gly Gly Pro Ala Leu His Gly Ser 
570 575 580 

tec tec gca cac egg atg cca gtt tac cca cat egg gat gag cac ggg 1888 
Ser Ser Ala His Arg Met Pro Val Tyr Pro His Arg Asp Glu His Gly 
585 590 595 600 

tac act ggc age tat aat tac age age tac gca aac cag cac cat cat 1936 
Tyr Thr Gly Ser Tyr Asn Tyr Ser Ser Tyr Ala Asn Gin His His His 
605 610 615 

gcc att cag agt caa tac tec agt tta acc cat gaa gca ggg ctg ccc 1984 
Ala He Gin Ser Gin Tyr Ser Ser Leu Thr His Glu Ala Gly Leu Pro 
620 625 630 

act cct ttg cat tat tec tea tac cac cgc acc tec gca cag tat ccg 2032 
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Thr Pro Leu His Tyr Ser Ser Tyr His Arg Thr Ser Ala Gin Tyr Pro 
635 640 645 

etc aac agt caa atg tec aga atg gag teg tgt eta atg age ggc tct 2080 
Leu Asn Ser Gin Met Ser Arg Met Glu Ser Cys Leu Met Ser Gly Ser 
650 655 660 

cct etc eta cac tec agt cca gtg ace cct cga tgg ccc gat gtg ccc 2128 
Pro Leu Leu His Ser Ser Pro Val Thr Pro Arg Trp Pro Asp Val Pro 
665 670 675 680 

tct gee aac age tgt tac tec agt ccc ace gtc cac gca tec cgc tac 217 6 

Ser Ala Asn Ser Cys Tyr Ser Ser Pro Thr Val His Ala Ser Arg Tyr 
685 690 695 

tec ace gga gac atg tac teg ccc ctt gee cca cgc agg aac tct gaa 2224 
Ser Thr Gly Asp Met Tyr Ser Pro Leu Ala Pro Arg Arg Asn Ser Glu 
700 705 710 

tac gag cac gca caa cac ttt cca gga ttc gee tat att aac ggg gag 2272 
Tyr Glu His Ala Gin His Phe Pro Gly Phe Ala Tyr He Asn Gly Glu 
715 720 725 

gee acg acc gga tgg gca aaa tga taaaccagcg gtggtccata tttaacacta 2326 
Ala Thr Thr Gly Trp Ala Lys 



730 




735 










ttacagagaa 


tgtatctgag 


aatggcaacg 


gtgtttttat 


tggtgtggtc 


agtgtttaca 


2386 


gtgeaaaget 


gecaatgaaa 


gttgattege 


aatcattgtg 


agagaaaacg 


ggacatccta 


2446 


aaaaaacgac 


tgaatgattt 


aagttattta 


taaagtctaa 


atttggtata 


cttttaatta 


2506 


aatatacatt 


etatgeacaa 


attaacacag 


aacgaacaga 


acatgttaaa 


ttgcccgtta 


2566 


aatacttttc 


tcccatatta 


gaaagaaaat 


gcttaatttg 


gettaatget 


ttaaagaagt 


2626 


gatgtgtata 


tacagttgaa 


gtcagaatta 


ttagtcgccc 


tgtttatttt 


ttcctccaac 


2686 


ttctgtttaa 


eggagagaag 


aattttttaa 


cacatttcta 


aatataatag 


ttttaataac 


2746 


tcatttaaaa 


taactgattt 


attttatctt 


tgecatgaac 


acagtgeata 


atatttgact 


2806 


agatattttt 


aaagacactt 


ctatacagct 


taaagtgaca 


tttaaaggct 


taactaggtt 


2866 


aattaggtta 


actaggcagg 


atagggcaat 


taggecagtt 


attttataac 


gatggtttgt 


2926 


tctgtagact 


ateggaaaaa 


ataattttga 


ccttaaaatg 


gtgtttaaaa 


aattaaaaac 


2986 


ttcttttatt 


etagecg 










3003 



<210> 10 

<211> 735 

<212> PRT 

<213> Danio rerio 

<400> 10 

Met Leu Cys Gly Leu Leu Glu Glu Pro Asp Met Asp Ser Thr Glu Ser 
15 10 15 
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Trp lie Glu Arg Cys Leu Asn Glu Ser Glu Ser Lys Arg Phe Ser Ser 
20 25 30 



His Ser Ser lie Gly Asn lie Ser Asn Asp Glu Asn Glu Glu Lys Glu 
35 40 45 



Asn Asn Arg Ala Ser Lys Pro His Ser Thr Pro Ala Thr Leu Gin Trp 
50 55 60 



Leu Glu Glu Asn Tyr Glu lie Ala Glu Gly Val Cys lie Pro Arg lie 
65 70 75 ™ 80 



Ala Leu Tyr Met His Tyr Leu Asp Phe Cys Glu Lys Leu Asp Ser Gin 
85 90 95 



Pro Val Asn Ala Ala Ser Phe Gly Lys lie lie Arg Gin Gin Phe Pro 
100 105 110 



Gin Leu Thr Thr Arg Arg Leu Gly Thr Arg Gly Gin Ser Lys Tyr His 
115 120 125 



Tyr Tyr Gly lie Ala Val Lys Glu Ser Ser Gin Tyr Tyr Asp Val Met 
130 135 140 



Tyr Ser Lys Lys Gly Ala Ala Trp Val Asn Glu Thr Gly Lys Lys Glu 
145 150 155 160 



Val Thr Lys Gin Thr Val Ala Tyr Ser Pro Arg Ser Lys Leu Gly Thr 
165 170 ~ 175 



Leu Leu Pro Asp Phe Pro Asn Val Lys Asp Leu Asn Leu Pro Ala Ser 
180 185 190 



Leu Pro Glu Glu Lys Val Ser Thr Phe lie Met Met Tyr Arg Thr His 
195 200 205 



Cys Gin Arg lie Leu Asp Thr Val lie Arg Ala Asn Phe Asp Glu Val 
210 215 220 



Gin Ser Phe Leu Leu His Phe Trp Gin Gly Met Pro Pro His Met Leu 
225 230 235 240 



Pro Val Leu Gly Ser Ser Thr Val Val Asn lie Val Gly Val Cys Asp 
245 250 ' 255 
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Ser lie Leu Tyr Lys Ala lie Ser Gly Val Leu Met Pro Thr Val Leu 
260 265 270 



Gin Ala Leu Pro Asp Ser Leu Thr Gin Val lie Arg Lys Phe Ala Lys 
275 280 285 



Gin Leu Asp Glu Trp Leu Lys Val Ala Leu His Asp Leu Pro Glu Asn 
290 295 300 



Leu Arg Asn lie Lys Phe Glu Leu Ser Arg Arg Phe Ser Gin lie Leu 
305 ~ ^ 310 315 320 



Lys Arg Gin Thr Ser Leu Asn His Leu Cys Gin Ala Ser Arg Thr Val 
325 330 335 



lie His Ser Ala Asp lie Thr Phe Gin Met Leu Glu Asp Trp Arg Asn 
340 345 350 



Val Asp Leu Asn Ser He Thr Lys Gin Thr Leu Tyr Thr Met Glu Asp 
355 360 365 



Ser Arg Glu Asp Gin Arg Arg Leu He He Gin Leu Tyr Gin Glu Phe 
370 375 380 



Asp Arg Leu Leu Glu Asp Gin Ser Pro He Glu Ala Tyr He Glu Trp 
385 390 395 400 



Leu Asp Ser Met Val Glu Arg Cys Val Val Arg Val Ala Gly Lys Arg 
405 410 415 



Pro Gly Ser Leu Lys Arg Val Ala Gin Gin Phe Leu Leu Met Trp Ser 
420 425 430 



Cys Phe Gly Thr Arg Val He Arg Asp Met Thr Leu His Ser Ala Pro 
435 " 440 445 



Ser Phe Gly Ser Phe His Leu He His Leu Met Phe Asp Asp Tyr Val 
450 455 460 



Leu Tyr Leu Leu Glu Ser Leu His Cys Gin Glu Arg Ala Asn Glu Leu 
465 470 475 v 480 



Met Arg Ala Met Lys Gly Glu Gly Ala Pro Ala Asp Thr Gly Glu Glu 
485 490 495 
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Leu Met Leu Met Ser Ser Thr Pro Thr Ser Thr Ser Pro Gly Pro Tyr 
500 505 510 



Ser Pro Ala Lys Ser Val His Ser Val Gly Val Pro Ala Val Gly Ser 
515 520 525 



Pro Asn Ser Ala Gin Ser Pro Glu Tyr Thr Ser lie Ser Ala Thr Thr 
530 535 540 



Gly Ala Val Gin Ser Tyr Thr Trp Ser Leu Thr Tyr Thr Val Thr Thr 
545 550 555 560 



Ser Gly Gly Ser Pro Thr Glu Pro Gly Ser Gin Leu Ser Cys Met Arg 
565 ~ 570 " 575 



Gly Gly Pro Ala Leu His Gly Ser Ser Ser Ala His Arg Met Pro Val 
580 585 590 



Tyr Pro His Arg Asp Glu His Gly Tyr Thr Gly Ser Tyr Asn Tyr Ser 
595 600 605 



Ser Tyr Ala Asn Gin His His His Ala lie Gin Ser Gin Tyr Ser Ser 
610 615 620 



Leu Thr His Glu Ala Gly Leu Pro Thr Pro Leu His Tyr Ser Ser Tyr 
625 630 635 640 



His Arg Thr Ser Ala Gin Tyr Pro Leu Asn Ser Gin Met Ser Arg Met 
645 650 655 



Glu Ser Cys Leu Met Ser Gly Ser Pro Leu Leu His Ser Ser Pro Val 
660 665 670 



Thr Pro Arg Trp Pro Asp Val Pro Ser Ala Asn Ser Cys Tyr Ser Ser 
675 680 685 



Pro Thr Val His Ala Ser Arg Tyr Ser Thr Gly Asp Met Tyr Ser Pro 
690 695 700 



Leu Ala Pro Arg Arg Asn Ser Glu Tyr Glu His Ala Gin His Phe Pro 
705 710 715 720 



Gly Phe Ala Tyr lie Asn Gly Glu Ala Thr Thr Gly Trp Ala Lys 
725 730 735 



<210> 11 
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<211> 4001 

<212> DNA 

<213> Homo sapiens 

<400> 11 



gtgagtgctg 


ggtttggaaa 


gtacagggac 


tgttgagatt 


acagcagtgg 


aggcagagca 


60 


caccccgtga tctaaaaggg 


atccgccctg 


caccttcctc 


tttcatgctg 


ctcgttcatc 


120 


cattccaggg 


gattcagcag 


ccacccaggt 


ctgcacaagc 


agtgccaact 


ccacaccttt 


180 


gtgggtctgc 


ctggctgtca 


ttgccactgg 


gctactggga 


agttttaagg 


gcccttcaaa 


240 


cccttctctc 


ttgctgccaa 


gtcttccctc 


ctcatgtcac 


atcctggtac 


caatagctac 


300 


tcattgctca 


agtcagaaac 


ctgggaatcc 


cttctccctc 


cccatccctg 


gccacaacta 


360 


atcaacatca 


aggctgggaa 


tacagcttcc 


ttggtatctc 


ctgaaccctg 


ccccaactgc 


420 


catgtacctg tgtcaggtca 


tcctgatccc 


tccctcccac 


cccctcctgc 


aaccaacccc 


480 


catctggcca 


ccacacagtt 


tgagatctct 


tcacattttg 


tgatctcttc 


aaaatgtaga 


540 


tttagtccta 


tatgcaccaa 


acccttccag 


ggtacatttc 


ttggcatggc 


cctagcttac 


600 


ctccccatga 


aatggcctgt 


taatcctctc 


ttcctctctt 


gggttccgtg 


tgcagccagg 


660 


ccaggctcca 


gtcccaaaca 


cactccacag 


gttctcctct 


tcgcctgtgc 


ctctgtctgt 


720 


aacaccctca 


acccccttgt 


ctggctaatt 


cagctcctgg 


gtcaccactt 


cctccaggag 


780 


gcctgccttg 


acaccagcct 


cccaggatca 


gtcacattgc 


cctctctttg 


ctctcacagc 


840 


acccacaatt 


tccccatcac 


cccctccctg 


cccctgtgct 


atgttctatt 


gagatttcaa 


900 


gcctctgcag gataggaatt 


ggataattca 


ttactgtgtc 


cacagtgtct 


gtcaagggca 


960 


agagagccac 


acagaacccc 


agtagttctc 


ctaaaagtgg 


agacgacaat 


aatatgtatt 


1020 


agttgtgttg tgaggatgaa 


atgaactaat 


gcatggaaga 


cgcctgaacc 


cgtggaaaca 


1080 


cataatgaat 


acattaatag 


taaatggtag 


ctattactat 


ttatgatgtt 


gacagcatta 


1140 


aatctaagca 


aacattttat 


tgaagtatga 


aatacattcc 


aaaaaatgcg 


caaaatcaca 


1200 


agtgtcacag 


tcagtgaatt 


gtcacaaatg 


acacccttgt 


gactagcccc 


tggataccat 


1260 


tacttttaag 


ctcaaatgtc 


agcttctcag 


agaggccctc 


ccttgacttc 


gtctcccacc 


1320 


tcttaattct 


atgttgtata 


tattatctgt 


ttcccctcac 


tcccatgtaa 


gcttcaggag 


1380 


agcaagattt 


tttttttttt 


ttttttggtc 


tattttgttc 


actgatgtag 


cctccgcttc 


1440 


ctagaatagt 


tctggacaca 


tagtagatgc 


acaataaaaa 


tttgccaagg 


aacgaatgag 


1500 


cgattattat 


tttcatttct 


ttaagctccc 


aggcgctgca 


gcatggtcat 


gcccgagaac 


1560 


tcgtgccatc 


ccaggtgaag 


cagcgctggg 


ccgggaccag 


ccgcacctgg 


cccggctctg 


1620 


agctgtgctg ggctggctcc 


gggttcttcc 


gcctcactcc 


tggcctgtga 


gcccggctca 


1680 
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cctcacccta 


cccacctcca 


ctctgcgtgc 


aaaaattata 


ataataatag 


caacaataat 


1740 


agtctcatcc 


gccctggaga 


ccacccgtgc 


ccccgtggca 


tccctcaagg 


actctccggg 


1800 


cggtggcagc 


cgcccaccct 


ggggacgcgc 


tccttgctgc 


caccggaacg 


cccctggcca 


1860 


ggctccatct 


acgcgctgtc 


agaccctccc gccgtctgaa 


gaaggctttt 


actcttcagc 


1920 


ctattccagt 


ggcagagaag 


ctaaggctac 


aaaggcgaac 


gcgaacagtc 


agatctgact 


1980 


tcgaattccg 


ctgtcattgc 


tgccaggcgc 


accacgagga 


cgcgcggtga 


ccgccaccat 


2040 


ggcattcggc 


tgccaaaggt 


ttccatcgac 


ctctttccca 


tcaccagcat 


cgcagcggga 


2100 


aagaatgtgc 


ctggcgccct 


tctgggcact 


gggcatgggg 


tggtgaacaa 


agtcctccag 


2160 


aaataaaccg 


ggtaatgagc 


ccggcagcgg 


ccggggcagg 


aagggacctt 


cgcagagagt 


2220 


ggtcaggcac 


agcccctccg 


aggaggcgac 


gctcagctga 


gaccagggtg 


acgcaaaggt 


2280 


gtcggccggt 


taggcacctg 


tgaggaagga 


ggagccggca 


gagtgccaag 


tagagggaac 


2340 


agcaaatgcc 


cggctccttt 


tataaccact 


gcttcagtta 


tcttccccca 


aagcttgaga 


2400 


gggggcaact 


ttgctacatt 


tcacagacga ggaagctgag 


gcccagaacg 


atgaaggaat 


2460 


ttacagagct 


gggattcgaa 


ccccgcgcta 


ccgtcagtcc 


atcccgggct 


ctgtccagcc 


2520 


ggtaccgcgc 


gccgccttct 


tcctcccgca 


ccgtgacctt 


aactcggcac 


gtgctggccc 


2580 


ctcgggctcc 


ccagtctccg 


tacattgtcc 


cactcagctc 


tgattgtggg 


gagggggcgg 


2640 


accgaggggg 


cggggggcgt 


ctttccgaag 


gatcgcggaa 


agccgcgcgc 


tgccaggggc 


2700 


ccggggttag 


agacccccac 


tcccgcacgg 


cgttagggac 


tccgcgcttc 


cccgcccccg 


2760 


ccgcggcccg 


ccggctctgc 


ctctgtccat 


ggtcaaagca 


cccggggtaa 


tccgcctttc 


2820 


tcttccgccc 


gccgggcccc 


attcatattc 


taatcacagc 


gcggccgacc 


cgcgaacggc 


2880 


cactttatcg 


gggcccgcag 


gagacgcagc 


ttgctccccc 


tcacttccac 


ttccagcacc 


2940 


ccccggccct 


cgcccccctc 


tttctgcact 


ttcaactccg 


ccgaggaggg 


ggtccctggg 


3000 


aaaaccgcgt 


ccccacttgg 


atgccggggc 


ttctcacaaa 


cttcgaggcc 


gactggggga 


3060 


cggcggtggg 


gtggggaggg 


cagggggagg 


gcggaggaac 


agagacagac 


agactgacag 


3120 


agttacggga 


agaggcgggg 


gaggggggac 


agtacagaga 


gaccgagggg gatagagaca 


3180 


gagaggggca 


gagtcctagg 


gggagacaaa 


gagaagtgga 


ggcagggtct 


ggacagagac 


3240 


actagcagcc 


aaggaaggag 


aaatggacag 


agacagagac 


acagaggacg 


agagggacag 


3300 


agagctagaa 


acagacaccg 


ggagacaggc 


ggagagagac 


agcgagatgg 


aaggagagaa 


3360 


acaggatgaa 


ggacccaggc 


ccagaggaag 


acagaaagtt 


ctggaggagg 


cgaaccagcc 


3420 


actcacctcc 


tccccgccta 


gcggccttgt tacgctcata ttggggcatg 


gggtcttagg 


3480 


gattcagttc 


cccttcccca 


ccctttcccc 


ttcaagctcg 


cttcactccc 


cacgcgtgtc 


3540 
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tgcggatccg 


cgtgcaaggg 


gtcaagacat 


accccctccc 


gcattctcag 


ggccaccacc 


3600 


cgaaatctaa 


cccaggacca 


aaatgggggg 


tgggtggggg 


cgcaagagaa 


ggaagggagt 


3660 


ggggccccac 


tcgtggtagc 


gcaggcgact 


ccccaggctc 


caggagttcc 


ccgcggctcc 


3720 


ccccccgccc 


gcgcccccct 


cccggcctgc 


cagcacggcg 


cggggcccga 


tggtggggaa 


3780 


gggccgggag 




ccacatctaa 


gccaattttg 


atttcgccta 


taatgagtgc 


q q a n 


cgggcgaagg 


ctggagaagg 


cctctggaac 


tttaaataag 


aaaaacgttg 


ctaatgctat 


3900 


aatagaaggg 


ggaagtcgga 


gggctgggat 


tgcgtcgctc 


tgagcccccc 


ttttcggagg 


3960 


cggcttttct 


tattcaaaac 


aggcccacaa 


tgggcttcac 


a 




4001 


<210> 12 
<211> 207 
<212> DNA 
<213> Mus 


musculus 












✓ /inns i o 
gaggggggca 


gatctaagcc 


aattttgatt 


tcgtctataa 


tgagtgccgg 


gctaaggctg 


60 


gagaaggcct 


ctggaacttt 


aaataagaaa 


aacgttgcta 


atgctataat 


agaaggggga 


120 


agtcggaggg 


ctgggattgc 


gtcgctctga 


gccccccttt 


tcggaggcgg cttttcttat 


180 


tcaaaacagg 


cccacaatgg 


gcttcac 








207 



<210> 13 

<211> 158 

<212> PRT 

<213> Danio rerio 

<400> 13 

Met Leu Cys Gly Leu Leu Glu Glu Pro Asp Met Asp Ser Thr Glu Ser 
1 5 10 15 



Trp lie Glu Arg Cys Leu Asn Glu Ser Glu Ser Lys Arg Phe Ser Ser 
20 25 30 



His Ser Ser lie Gly Asn He Ser Asn Asp Glu Asn Glu Glu Lys Glu 
35 40 45 



Asn Asn Arg Ala Ser Lys Pro His Ser Thr Pro Ala Thr Leu Gin Trp 
50 55 60 



Leu Glu Glu Asn Tyr Glu He Ala Glu Gly Val Cys He Pro Arg He 
65 70 75 80 



Ala Leu Tyr Met His Tyr Leu Asp Phe Cys Glu Lys Leu Asp Ser Gin 
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85 90 95 



Pro Val Asn Ala Ala Ser Phe Gly Lys lie lie Arg Gin Gin Phe Pro 
100 105 110 



Gin Leu Thr Thr Arg Arg Leu Gly Thr Arg Gly Gin Ser Lys Tyr His 
115 " 120 125 



Tyr Tyr Gly lie Ala Val Lys Glu Ser Ser Gin Tyr Tyr Asp Val Met 
130 135 140 



Tyr Ser Lys Lys Gly Ala Ala Trp Val Asn Glu Thr Gly Lys 
145 150 155 



<210> 14 

<211> 3369 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (110) . . (2035) 
<400> 14 

aggtgggaag gcagttatga cagttgagaa gtagtagaag acacggaagg cacagaaggc 60 

agacttcgct cagcacaaag aagaattttc tgataaccat actggcaaa atg aac tgg 118 

Met Asn Trp 
1 

get gec ttc gga ggg tct gaa ttc ttc ate cca gaa ggc att cag ata 166 
Ala Ala Phe Gly Gly Ser Glu Phe Phe lie Pro Glu Gly lie Gin lie 
5 10 15 

gat teg aga tgc cca eta age aga aat ate acg gaa tgg tac cat tac 214 
Asp Ser Arg Cys Pro Leu Ser Arg Asn lie Thr Glu Trp Tyr His Tyr 
20 25 30 35 

tat ggc att gca gtg aaa gaa age tec caa tat tat gat gtg atg tat 2 62 

Tyr Gly lie Ala Val Lys Glu Ser Ser Gin Tyr Tyr Asp Val Met Tyr 
40 45 50 

tec aag aaa gga get gee tgg gtg agt gag acg ggc aag aaa gaa gtg 310 
Ser Lys Lys Gly Ala Ala Trp Val Ser Glu Thr Gly Lys Lys Glu Val 
55 60 65 

age aaa cag aca gtg gca tat tea ccc egg tec aaa etc gga aca ctg 358 
Ser Lys Gin Thr Val Ala Tyr Ser Pro Arg Ser Lys Leu Gly Thr Leu 
70 75 80 

ctg cca gaa ttt ccc aat gtc aaa gat eta aat ctg cca gec age ctg 406 
Leu Pro Glu Phe Pro Asn Val Lys Asp Leu Asn Leu Pro Ala Ser Leu 
85 90 95 

cct gag gag aag gtt tct acc ttt att atg atg tac aga aca cac tgt 454 
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Pro Glu Glu Lys Val Ser Thr Phe lie Met Met Tyr Arg Thr His Cys 
100 105 110 115 

cag aga ata ctg gac act gta ata aga gcc aac ttt gat gag gtt caa 502 
Gin Arg lie Leu Asp Thr Val lie Arg Ala Asn Phe Asp Glu Val Gin 
120 125 130 

agt ttc ctt ctg cac ttt tgg caa gga atg ccg ccc cac atg ctg cct 550 
Ser Phe Leu Leu His Phe Trp Gin Gly Met Pro Pro His Met Leu Pro 
135 140 145 

gtg ctg ggc tec tec acg gtg gtg aac att gtc ggc gtg tgt gac tec 598 
Val Leu Gly Ser Ser Thr Val Val Asn lie Val Gly Val Cys Asp Ser 
150 155 160 

ate etc tac aaa get ate tec ggg gtg ctg atg ccc act gtg ctg cag 64 6 

lie Leu Tyr Lys Ala He Ser Gly Val Leu Met Pro Thr Val Leu Gin 
165 170 175 

gca tta cct gac age tta act cag gtg att cga aag ttt gcc aag caa 694 
Ala Leu Pro Asp Ser Leu Thr Gin Val He Arg Lys Phe Ala Lys Gin 
180 185 190 195 

ctg gat gag tgg eta aaa gtg get etc cac gac etc cca gaa aac ttg 742 
Leu Asp Glu Trp Leu Lys Val Ala Leu His Asp Leu Pro Glu Asn Leu 
200 205 210 

cga aac ate aag ttc gaa ttg teg aga agg ttc tec caa att ctg aga 790 
Arg Asn He Lys Phe Glu Leu Ser Arg Arg Phe Ser Gin He Leu Arg 
215 220 225 

egg caa aca tea eta aat cat etc tgc cag gca tct cga aca gtg ate 838 
Arg Gin Thr Ser Leu Asn His Leu Cys Gin Ala Ser Arg Thr Val He 
230 235 240 

cac agt gca gac ate acg ttc caa atg ctg gaa gac tgg agg aac gtg 88 6 

His Ser Ala Asp He Thr Phe Gin Met Leu Glu Asp Trp Arg Asn Val 
245 250 255 

gac ctg aac age ate ace aag caa ace ctt tac ace atg gaa gac tct 934 
Asp Leu Asn Ser He Thr Lys Gin Thr Leu Tyr Thr Met Glu Asp Ser 
260 265 270 275 

cgc gat gag cac egg aaa etc ate acc caa tta tat cag gag ttt gac 982 
Arg Asp Glu His Arg Lys Leu He Thr Gin Leu Tyr Gin Glu Phe Asp 
280 285 290 

cat etc ttg gag gag cag tct ccc ate gag tec tac att gag tgg ctg 1030 
His Leu Leu Glu Glu Gin Ser Pro He Glu Ser Tyr He Glu Trp Leu 
295 300 305 

gat acc atg gtt gac cgc tgt gtt gtg aag gtg get gcc aag aga caa 1078 
Asp Thr Met Val Asp Arg Cys Val Val Lys Val Ala Ala Lys Arg Gin 
310 315 320 

ggg tec ttg aag aaa gtg gcc cag cag ttc etc ttg atg tgg tec tgt 1126 
Gly Ser Leu Lys Lys Val Ala Gin Gin Phe Leu Leu Met Trp Ser Cys 
325 330 335 

ttc ggc aca agg gtg ate egg gac atg acc ttg cac age gcc ccc age 1174 
Phe Gly Thr Arg Val He Arg Asp Met Thr Leu His Ser Ala Pro Ser 
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340 345 350 355 

ttc ggg tct ttt cac eta att cac tta atg ttt gat gac tac gtg etc 1222 
Phe Gly Ser Phe His Leu lie His Leu Met Phe Asp Asp Tyr Val Leu 
360 365 370 

tac ctg tta gaa tct ctg cac tgt cag gag egg gec aat gag etc atg 1270 
Tyr Leu Leu Glu Ser Leu His Cys Gin Glu Arg Ala Asn Glu Leu Met 
375 380 385 

cga gee atg aag gga gaa gga age act gca gaa gtc cga gaa gag ate 1318 
Arg Ala Met Lys Gly Glu Gly Ser Thr Ala Glu Val Arg Glu Glu lie 
390 395 400 

ate ttg aca gag get gec gca cca ace cct tea cca gtg cca teg ttt 1366 
lie Leu Thr Glu Ala Ala Ala Pro Thr Pro Ser Pro Val Pro Ser Phe 
405 410 415 

tct cca gca aaa tct gec aca tct gtg gaa gtg cca cct ccc tct tec 1414 
Ser Pro Ala Lys Ser Ala Thr Ser Val Glu Val Pro Pro Pro Ser Ser 
420 425 430 435 

cct gtt age aat cct tec cct gag tac act ggc etc age act aca gga 14 62 

Pro Val Ser Asn Pro Ser Pro Glu Tyr Thr Gly Leu Ser Thr Thr Gly 
440 445 450 

gca atg cag tct tac acg tgg tct eta aca tac aca gtg acg acg get 1510 
Ala Met Gin Ser Tyr Thr Trp Ser Leu Thr Tyr Thr Val Thr Thr Ala 
455 460 465 

get ggg tec cca get gag aac tec caa cag ctg ccc tgt atg agg aac 1558 
Ala Gly Ser Pro Ala Glu Asn Ser Gin Gin Leu Pro Cys Met Arg Asn 
470 475 480 

act cat gtg cct tct tec tec gtc aca cac agg ata cca gtt tat ccc 1606 
Thr His Val Pro Ser Ser Ser Val Thr His Arg He Pro Val Tyr Pro 
485 490 495 

cac aga gag gaa cat gga tac acg gga age tat aac tat ggg age tat 1654 
His Arg Glu Glu His Gly Tyr Thr Gly Ser Tyr Asn Tyr Gly Ser Tyr 
500 505 510 515 

ggc aac cag cat cct cac ccc atg cag age cag tat ccg gee etc cct 1702 
Gly Asn Gin His Pro His Pro Met Gin Ser Gin Tyr Pro Ala Leu Pro 
520 525 530 

cat gac aca get ate tct ggg cca etc cac tat gee cct tac cac agg 1750 
His Asp Thr Ala He Ser Gly Pro Leu His Tyr Ala Pro Tyr His Arg 
535 540 545 

age tct gca cag tac cct ttt aat age ccc act tec egg atg gaa cct 17 98 

Ser Ser Ala Gin Tyr Pro Phe Asn Ser Pro Thr Ser Arg Met Glu Pro 
550 ~ 555 560 

tgt ttg atg age agt act ccc aga ctg cat cct ace cca gtc act ccc 18 4 6 

Cys Leu Met Ser Ser Thr Pro Arg Leu His Pro Thr Pro Val Thr Pro 
565 570 575 

cgc tgg cca gag gtg ccc tea gec aac acg tgc tac aca age ccg tct 1894 
Arg Trp Pro Glu Val Pro Ser Ala Asn Thr Cys Tyr Thr Ser Pro Ser 
580 585 590 595 
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gtg cat tct gcg agg tac gga aac tct agt gac atg tat aca cct ctg 1942 
Val His Ser Ala Arg Tyr Gly Asn Ser Ser Asp Met Tyr Thr Pro Leu 
600 605 " 610 

aca acg cgc agg aat tct gaa tat gag cac atg caa cac ttt cct ggc 1990 
Thr Thr Ar^ Arg Asn Ser Glu Tyr Glu His Met Gin His Phe Pro Gly 
615 620 625 



ttt get tac ate aac gga gag gcc tct aca gga tgg get aaa tga 2035 
Phe Ala Tyr lie Asn Gly Glu Ala Ser Thr Gly Trp Ala Lys 



630 


635 




640 






U Ly v — l_ CI L> V*Ci L~ 


ayy ua lul.q u 


atttaatatt 


aataataata 


attaataata 


ataataaacc 


2095 






actttatctc 


tatacattgt 


aactcatggg 


ctattcctaa 


2155 


i-t4- prpp /-> 3 4-4-4- 


4— >-» /— • 4- ^ !3 4" /-^r ^ o 

ttClaatydd 


catgaggatg 


ggatcaatgt 


gggatgaata 


aactttagtt 


2215 


CayadaCayg 


dCLLaCtaad 


agtcagtggg 


actgggtttc 


tgtagccaag 


ccagacttga 


2275 


r> 4- rr-h 4- 4- rr-H 


agagcac ua l. 


ctegggcagg 


ccattctgtg 


ccttttccct 


ctgttccatg 


2335 


aCLLLyCLtt 


/~t-4- / "< 4- 4— f^t /— r /—* 

y ty LLyyCaa 


ccacttctag 


taagctactg 


attttcctgt 


tgacaaaatc 


2395 




tga a gga tgg 


atactggaga 


cagaatctgg 


tttgtgttct 


tggatgggca 


2455 


<*"» — ^4—— ^^4- 4— 4— — i 
CataattLaC 


CddydgCdLL 


caccttgcca 


tctgtcttgt 


cattgtactg 


tacaaggaac 


2515 


dyCLCLCaga 


uy ty utctyc 


acatcccttc 


ttcctggtgg 


taccatccct 


atttcctgga 


2575 


etc* zx /-"fa rr rr/~t 

yuaL-uayyyu 


uaactLyyyyo 


gctatctgga 


aactctagat 


tttctgtcat 


acccacatct 


2635 


/~r+- r*>2s rt+" a <-*» 


L-tyuaLuy 


ttggaatgta 


agcactgtct 


tgagggaagg 


aagaggtctg 


2695 


l. uuuy uci u u y 


u uaay i» l. y 


attgaggttt 


gtaggagact 


ggttcttcta 


catacaagga 


2755 


tti-crt-nt-f 33 
v- y u ad 


_.4_ 4_ 4_ _ _ 

y u. t_ LyLauaa 


tggctagtgt 


cagcaaaagg 


caggagaggg 


tttttgtttt 


2815 


ttttttaagt 


tctatgagaa 


tgtggattta 


tggcattgag 


tatcacactc 


agetctgetg 


2875 


tgttaacttt 


gtgaaactgg 


atggaacaaa 


ctttaactta 


ccaagcacca 


agtgtgaaag 


2935 


tgactttcac 


ggttccttca 


taaaactata 


ataatatccg 


acactttgat 


agaaaaaaat 


2995 


tcaaagctgt 


gectttgage 


ctatactata 


ctgtgtatgt 


gtggaaataa 


aaatgtattg 


3055 


tacttttgga 


gaattttttg 


taggcatttt 


tctgtcagat 


ttgtagtaat 


ttgtgaggtt 


3115 


tgttagagat 


taatataggt 


tttctttctg 


tattataaaa 


tgcaccaagc 


aattatggtg 


3175 


gacctattac 


cctatgggta 


agaaataaat 


ggaaatatga 


categgatgt 


ttcagcaact 


3235 


gttctgtaaa 


taaaatcttt 


gatcacacca 


ctcagtgtga 


taattgtgtc 


tacagctaaa 


3295 


atggaaatag 


ttttatctgt 


acagttgtgc 


aagatatgaa 


tggtttcaca 


ctcaaataaa 


3355 


aaatattgaa 


acga 










3369 
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<211> 641 
<212> PRT 
<213> Homo sapiens 

<400> 15 

Met Asn Trp Ala Ala Phe Gly Gly Ser Glu Phe Phe lie Pro Glu Gly 
15 10 15 



He Gin He Asp Ser Arg Cys Pro Leu Ser Arg Asn lie Thr Glu Trp 
20 25 " 30 



Tyr His Tyr Tyr Gly He Ala Val Lys Glu Ser Ser Gin Tyr Tyr Asp 
35 40 45 



Val Met Tyr Ser Lys Lys Gly Ala Ala Trp Val Ser Glu Thr Gly Lys 
50 55 60 



Lys Glu Val Ser Lys Gin Thr Val Ala Tyr Ser Pro Arg Ser Lys Leu 
65 70 75 80 



Gly Thr Leu Leu Pro Glu Phe Pro Asn Val Lys Asp Leu Asn Leu Pro 
85 90 95 



Ala Ser Leu Pro Glu Glu Lys Val Ser Thr Phe He Met Met Tyr Arg 
100 105 110 



Thr His Cys Gin Arg He Leu Asp Thr Val He Arg Ala Asn Phe Asp 
115 120 125 



Glu Val Gin Ser Phe Leu Leu His Phe Trp Gin Gly Met Pro Pro His 
130 135 140 



Met Leu Pro Val Leu Gly Ser Ser Thr Val Val Asn He Val Gly Val 
145 150 155 160 



Cys Asp Ser He Leu Tyr Lys Ala He Ser Gly Val Leu Met Pro Thr 
165 170 175 



Val Leu Gin Ala Leu Pro Asp Ser Leu Thr Gin Val He Arg Lys Phe 
180 185 190 



Ala Lys Gin Leu Asp Glu Trp Leu Lys Val Ala Leu His Asp Leu Pro 
195 200 205 



Glu Asn Leu Arg Asn He Lys Phe Glu Leu Ser Arg Arg Phe Ser Gin 
210 215 220 
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lie Leu Arg Arg Gin Thr Ser Leu Asn His Leu Cys Gin Ala Ser Arg 
225 230 235 240 



Thr Val lie His Ser Ala Asp lie Thr Phe Gin Met Leu Glu Asp Trp 
245 250 255 



Arg Asn Val Asp Leu Asn Ser lie Thr Lys Gin Thr Leu Tyr Thr Met 
260 265 270 



Glu Asp Ser Arg Asp Glu His Arg Lys Leu lie Thr Gin Leu Tyr Gin 
275 280 285 



Glu Phe Asp His Leu Leu Glu Glu Gin Ser Pro lie Glu Ser Tyr lie 
290 295 300 



Glu Trp Leu Asp Thr Met Val Asp Arg Cys Val Val Lys Val Ala Ala 
305 310 315 320 



Lys Arg Gin Gly Ser Leu Lys Lys Val Ala Gin Gin Phe Leu Leu Met 
325 330 335 



Trp Ser Cys Phe Gly Thr Arg Val lie Arg Asp Met Thr Leu His Ser 
340 345 350 



Ala Pro Ser Phe Gly Ser Phe His Leu lie His Leu Met Phe Asp Asp 
355 360 365 



Tyr Val Leu Tyr Leu Leu Glu Ser Leu His Cys Gin Glu Arg Ala Asn 
370 375 380 



Glu Leu Met Arg Ala Met Lys Gly Glu Gly Ser Thr Ala Glu Val Arg 
385 390 395 400 



Glu Glu lie lie Leu Thr Glu Ala Ala Ala Pro Thr Pro Ser Pro Val 
405 410 415 



Pro Ser Phe Ser Pro Ala Lys Ser Ala Thr Ser Val Glu Val Pro Pro 
420 425 430 



Pro Ser Ser Pro Val Ser Asn Pro Ser Pro Glu Tyr Thr Gly Leu Ser 
435 440 445 



Thr Thr Gly Ala Met Gin Ser Tyr Thr Trp Ser Leu Thr Tyr Thr Val 
4 50 455 4 60 
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Thr Thr Ala Ala Gly Ser Pro Ala Glu Asn Ser Gin Gin Leu Pro Cys 
465 470 475 480 

Met Arg Asn Thr His Val Pro Ser Ser Ser Val Thr His Arg lie Pro 
485 490 495 



Val Tyr Pro His Arg Glu Glu His Gly Tyr Thr Gly Ser Tyr Asn Tyr 
500 " 505 ~ 510 



Gly Ser Tyr Gly Asn Gin His Pro His Pro Met Gin Ser Gin Tyr Pro 
515 520 525 



Ala Leu Pro His Asp Thr Ala lie Ser Gly Pro Leu His Tyr Ala Pro 
530 535 540 



Tyr His Arg Ser Ser Ala Gin Tyr Pro Phe Asn Ser Pro Thr Ser Arg 
545 " 550 555 560 



Met Glu Pro Cys Leu Met Ser Ser Thr Pro Arg Leu His Pro Thr Pro 
565 570 575 



Val Thr Pro Arg Trp Pro Glu Val Pro Ser Ala Asn Thr Cys Tyr Thr 
580 585 590 



Ser Pro Ser Val His Ser Ala Arg Tyr Gly Asn Ser Ser Asp Met Tyr 
595 600 605 



Thr Pro Leu Thr Thr Arg Arg Asn Ser Glu Tyr Glu His Met Gin His 
610 615 620 



Phe Pro Gly Phe Ala Tyr lie Asn Gly Glu Ala Ser Thr Gly Trp Ala 
625 630 635 640 



Lys 



<210> 16 

<211> 23 

<212> DNA 

<213> artificial sequence 
<220> 

<223> synthetic oligonucleotide primer 

<400> 16 

aggtgggaag gcagttatga cag 23 

<210> 17 
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<211> 25 

<212> DNA 

<213> artificial sequence 
<220> 

<223> synthetic oligonucleotide primer 

<400> 17 

tccgtgatat ttctgcttag tgggc 25 



<210> 18 

<211> 28 

<212> DNA 

<213> artificial sequence 



<220> 

<223> synthetic oligonucleotide primer 
<400> 18 ■ 

ggcagttatg acagttgaga agtagtag 28 



<210> 19 

<211> 27 

<212> DNA 

<213> artificial sequence 



<220> 
<223> 



synthetic oligonucleotide primer 



<400> 19 

ctgcttagtg ggcatctcga atctatc 



27 



<210> 20 

<211> 20 

<212> DNA 

<213> artificial sequence 
<220> 

<223> synthetic oligonucleotide primer 



<400> 20 

ttttgacggg tttggctttg 



20 



<210> 21 

<211> 22 

<212> DNA 

<213> artificial sequence 



<220> 

<223> synthetic oligonucleotide primer 
<400> 21 

ttcctccagt aacccacaat gc 22 



<210> 22 
<211> 21 



WO 03/088919 



# 



PCT/US03/12348 



<212> DNA 

<213> artificial sequence 
<220> 

<223> synthetic oligonucleotide primer 

<400> 22 

tggagaggcc acagctgctg g 21 

i 

<210> 23 

<211> 20 

<212> DNA 

<213> artificial sequence 
<220> 

<223> synthetic oligonucleotide primer 



<210> 24 

<211> 20 

<212> DNA 

<213> artificial sequence 
<220> 

<223> synthetic oligonucleotide primer 

<400> 24 

cacagctgct ggcttcctgg 20 



<210> 25 

<211> 25 

<212> DNA 

<213> artificial sequence 
<220> 

<223> synthetic oligonucleotide primer 



<210> 26 

<211> 25 

<212> DNA 

<213> artificial sequence 
<220> 

<223> synthetic oligonucleotide primer 

<400> 26 

ctgaccaatt tgacggcgct gcaca 25 

<210> 27 

<211> 21 

<212> DNA 



<400> 23 

tcgaggcctg gtcctgtcgc 



20 



<400> 25 

acaactctgc gatgggctct gcttt 



25 
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<213> 



artificial sequence 



<220> 
<223> 



synthetic oligonucleotide primer 



<400> 27 

ggccattgtc accactcgta a 



21 



<210> 28 

<211> 21 

<212> DNA 

<213> artificial sequence 
<220> 

<223> synthetic oligonucleotide primer 

<400> 28 

cacaagtaaa ggctaacgcg c 21 

<210> 29 

<211> 22 

<212> DNA 

<213> artificial sequence 
<220> 

<223> synthetic oligonucleotide primer 



<210> 30 

<211> 22 

<212> DNA 

<213> artificial sequence 
<220> 

<223> synthetic oligonucleotide primer 

<400> 30 

ggcactctta gcaaacctca gg 22 

<210> 31 

<211> 21 

<212> DNA 

<213> artificial sequence 
<220> 

<223> synthetic oligonucleotide primer 



<400> 29 

agccagtaat aagaactgca ga 



22 



<400> 31 

catggaaagg gcagagtgag c 



21 



<210> 32 
<211> 21 
<212> DNA 



<213> artificial sequence 
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<220> 



<223> synthetic oligonucleotide primer 



<400> 32 

ggccattgtc accactcgta a 



21 



<210> 33 

<211> 14 

<212> PRT 

<213> Homo sapiens 

<400> 33 

Met His Cys Gly Leu Leu Glu Glu Pro Asp Met Asp Ser Thr 
15 10 



<210> 


34 


<211> 


14 


<212> 


PRT 


<213> 


Mus 


<400> 


34 



Met His Cys Gly Leu Leu Glu Glu Pro Asp Met Asp Ser Thr 
15 10 



<210> 35 

<211> 14 

<212> PRT 

<213> Danio rerio 

<400> 35 

Met Leu Cys Gly Leu Leu Glu Glu Pro Asp Met Asp Ser Thr 
1 5 10 



<210> 


36 


<211> 


223 


<212> 


DNA 


<213> 


Homo 


<400> 


36 



sapiens 



ctttggtgca gtgagagccg cctttcatag gaaaacagtt tgtgctcctg actgggccac 60 

ctttcacccc ttgttcaagt agcagctcat ttggtaaggg gtcaggaata aagggctctt 120 

tcttccctct ccatgtgtag gaaagtcagc ccttggtgtg gagagtcatt tctcaaaata 180 

gatcttccta atatggttcc aaagagagca agagtcagtc aca 223 



<210> 37 

<211> 2208 

<212> DNA 

<213> Homo sapiens 
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<400> 37 
atgcattgtg 


ggttactgga 


ggaacccgac 


atggattcca 


cagagagctg 


gattgaaaga 


60 


tgtctcaacg 


aaagtgaaaa 


caaacgttat 


tccagccaca 


catctctggg 


gaatgtttct 


120 


aatgatgaaa 


atgaggaaaa 


agaaaataat 


agagcatcca 


agccccactc 


cactcctgct 


180 


actctgcaat 


ggctggagga 


gaactatgag 


attgcagagg 


gggtctgcat 


ccctcgcagt 


240 


gccctctata 


tgcattacct 


ggatttctgc 


gagaagaatg 


atacccaacc 


tgtcaatgct 


300 


gccagctttg 


gaaagatcat 


aaggcagcag 


tttcctcagt 


taaccaccag 


aagactcggg 


360 


acccgaggac 


agtcaaagta 


ccattactat 


ggcattgcag 


tgaaagaaag 


ctcccaatat 


420 


tatgatgtga 


tgtattccaa 


gaaaggagct 


gcctgggtga 


gtgagacggg 


caagaaagaa 


480 


gtgagcaaac 


agacagtggc 


atattcaccc 


cggtccaaac 


tcggaacact 


gctgccagaa 


540 


tttcccaatg 


tcaaagatct 


aaatctgcca 


gccagcctgc 


ctgaggagaa 


ggtttctacc 


600 


tttattatga 


tgtacagaac 


acactgtcag 


agaatactgg 


acactgtaat 


aagagccaac 


660 


tttgatgagg 


ttcaaagttt 


ccttctgcac 


ttttggcaag 


gaatgccgcc 


ccacatgctg 


720 


cctgtgctgg 


gctcctccac 


ggtggtgaac 


attgtcggcg 


tgtgtgactc 


catcctctac 


780 


aaagctatct 


ccggggtgct 


gatgcccact 


gtgctgcagg 


cattacctga 


cagcttaact 


840 


caggtgattc 


gaaagtttgc 


caagcaactg 


gatgagtggc 


taaaagtggc 


tctccacgac 


900 


ctcccagaaa 


acttgcgaaa 


catcaagttc 


gaattgtcga 


gaaggttctc 


ccaaattctg 


960 


agacggcaaa 


catcactaaa 


tcatctctgc 


caggcatctc 


gaacagtgat 


ccacagtgca 


1020 


gacatcacgt 


tccaaatgct 


ggaagactgg 


aggaacgtgg 


acctgaacag 


catcaccaag 


1080 


caaacccttt 


acaccatgga 


agactctcgc 


gatgagcacc 


ggaaactcat 


cacccaatta 


1140 


tatcaggagt 


ttgaccatct 


cttggaggag 


cagtctccca 


tcgagtccta 


cattgagtgg 


1200 


ctggatacca 


tggttgaccg 


ctgtgttgtg 


aaggtggctg 


ccaagagaca 


agggtccttg 


1260 


aagaaagtgg 


cccagcagtt 


cctcttgatg 


tggtcctgtt 


tcggcacaag 


ggtgatccgg 


1320 


gacatgacct 


tgcacagcgc 


ccccagcttc 


gggtcttttc 


acctaattca 


cttaatgttt 


1380 


gatgactacg 


tgctctacct 


gttagaatct 


ctgcactgtc 


aggagcgggc 


caatgagctc 


1440 


atgcgagcca 


tgaagggaga 


aggaagcact 


gcagaagtcc 


gagaagagat 


catcttgaca 


1500 


gaggctgccg 


caccaacccc 


ttcaccagtg 


ccatcgtttt 


ctccagcaaa 


atctgccaca 


1560 


tctatggaag 


tgccacctcc 


ctcttcccct 


gttagcaatc 


cttcccctga 


gtacactggc 


1620 


ctcagcacta 


caggagcaat 


gcagtcttac 


acgtggtctc 


taacatacac 


agtgacgacg 


1680 


gctgctgggt 


ccccagctga 


gaactcccaa 


cagctgccct 


gtatgaggaa 


cactcatgtg 


1740 


ccttcttcct 


ccgtcacaca 


caggatacca 


gtttatcccc 


acagagagga 


acatggatac 


1800 
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acgggaagct ataactatgg gagctatggc aaccagcatc ctcaccccat gcagagccag 1860 

tatccggccc tccctcatga cacagctatc tctgggccac tccactatgc cccttaccac 1920 

aggagctctg cacagtaccc ttttaatagc cccacttccc ggatggaacc ttgtttgatg 1980 

agcagtactc ccagactgca tcctacccca gtcactcccc gctggccaga ggtgccctca 204 0 

gccaacacgt gctacacaag cccgtctgtg cattctgcga ggtacggaaa ctctagtgac 2100 

atgtatacac ctctgacaac gcgcaggaat tctgaatatg agcacatgca acactttcct 2160 

ggctttgctt acatcaacgg agaggcctct acaggatggg ctaaatga 2208 

<210> 38 

<211> 2214 

<212> DNA 

<213> Mus mus cuius 

<400> 38 

atgcattgtg ggttactgga ggaacccgac atggattcca cagagagctg gattgaaaga 60 

tgtctcaatg aaagcgagaa taaacgctat tccagtcaca catctctggg gaatgtgtct 120 

aatgatgaaa atgaggaaaa agaaaataac agagcatcca agccccactc cacgccggcc 180 

accctgcaat ggctggagga aaactatgag attgctgagg gcgtctgcat cccccgcagc 24 0 

gccctctaca tgcactacct ggatttctgt gagaagaacg acactcagcc tgtcaatgct 300 

gccagctttg ggaagatcat aaggcagcag tttcctcagc taaccaccag aagactcggg 360 

accgggaccc gaggacagtc aaagtaccat tactatggca tagcggtgaa ggagagctcc 420 

cagtattatg atgtgatgta ctcaaagaaa ggagctgcct gggtgagcga gacgggcaag 480 

agagaagtca ccaagcagac ggtggcatat tctccccggt ccaagcttgg gacattgctg 54 0 

ccagactttc caaacgtcaa agacctaaat ctgccagcca gtcttcctga ggagaaggtg 600 

tctaccttta ttatgatgta cagaacacac tgtcagagaa tactggacac tgtaataaga 660 

gccaactttg atgaggttca aagtttcctt ctgcactttt ggcaagggat gccgccccac 720 

atgctgcccg tgctaggctc ctccacggtg gtgaacatcg tgggtgtgtg tgactccatc 780 

ctctacaaag ccatctccgg tgtgttgatg cccacggtgc tgcaggcgtt gccggacagc 840 

ttaactcagg tgatccgaaa gtttgccaag cagctggacg agtggctgaa agtggctctc 900 

cacgatctcc c'ggaaaacct gagaaacatc aaatttgaat tatcaaggag gttttcccaa 960 

atcctaagga ggcaaacatc gctgaaccat ctgtgccagg catctcgaac ggtgatccac 1020 

agtgcagaca tcacgttcca gatgctggag gactggagga atgtggacct gagtagcatc 1080 

accaagcaga ctctgtatac catggaggac tctcgggatg agcaccgcag actcatcatc 1140 

cagttgtacc aggagtttga ccacctgctg gaggaacagt cccccatcga gtcttacata 1200 
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gaatggctgg 


ataccatggt 


agaccgatgc 


tctctgaaga 


aagtagccca 


acagttcctg 


atccgggaca 


tgaccttgca 


cagtgccccc 


atgttcgacg 


actacgtgct 


ctacttgcta 


gagctcatgc 


gagccatgaa 


aggagaagga 


ttgacagagg 


ctaccccacc 


aaccccttca 


gccacatctg 


tggaggtgcc 


acctccctcc 


actggcctta 


gcacagcagg 


agcgatgcag 


acaacggctg 


cagggtcacc 


ggctgagaac 


catatgcctt 


cttcctccgt 


cacacacagg 


gggtacacgg 


gaagctataa 


ctacgggagc 


aaccagtatc 


cagccttgcc 


tcatgacaca 


taccacagga 


gctctgccca 


gtaccctttc 


ttgatgagca 


gtactcccag 


gctgcatcct 


ccgactgcca 


acgcatgcta 


cacaagccca 


agtgacatgt 


acaccccgct 


gaccacgcgc 


tttcctggct 


ttgcttacat 


caacggagag 
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<400> 39 
atgctttgtg 


ggctgctgga 


agagcctgac 


tgtctgaacg 


aaagcgagag 


caagcgcttc 


aacgacgaaa 


acgaagagaa 


ggaaaataac 


acattacaat 


ggttggagga 


gaactacgag 


gccctgtaca 


tgcactacct 


ggacttctgc 


gcaagcttcg 


gaaagataat 


aaggcagcag 


actagaggtc 


aatcaaagta 


tcattactat 


tacgatgtga 


tgtactctaa 


aaagggcgct 


gtcaccaaac 


agacagtagc 


gtattcaccg 


tttccaaatg 


tcaaagacct 


aaatctgccc 
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gttgtaaagg 


tggctgccaa 


gagacaaggg 


1260 


ctgatgtggt 


cttgctttgg 


tacgagggtg 


1320 


agcttcgggt 


cttttcacct 


gattcacctg 


1380 


gaatctctgc 


attgtcagga 


gcgggccaac 


1440 


agcactgcag 


aagcccagga 


agagattatc 


1500 


cctggtccat 


cattttctcc 


agcaaagtct 


1560 


tcccctgtca 


gcaacccatc 


ccccgaatac 


1620 


tcatatacgt 


ggtcgctaac 


atatacagta 


1680 


tcccaacaac 


taccctgtat 


gaggagcacc 


1740 


ataccagtct 


actcccacag 


agaggagcat 


1800 


tatggcaacc 


agcatcctca 


cccactgcag 


1860 


gccatctctg 


ggcctctcca 


ctattcccct 


1920 


aatagcccca 


cttccaggat 


ggaaccttgt 


1980 


accccagtga 


ctccccgatg 


gccagaggtg 


2040 


tctgtgcatt 


ccacgaggta 


tggaaactct 


2100 


aggaattctg 


agtatgagca 


catgcaacac 


2160 


gcctccactg 


gatgggctaa 


gtga 


2214 



atggattcca 


cagagagctg 


gattgaaaga 


60 


tccagccact 


cttctattgg 


aaatatttcc 


120 


cgagcatcta 


agccacattc 


aacacctgct 


180 


atcgcagagg 


gtgtgtgtat 


tcctcgcatc 


240 


gaaaaactgg 


actcacagcc 


agtcaatgct 


300 


tttcctcagt 


tgaccacgcg 


gagattagga 


360 


ggcatcgcag 


tgaaggagag 


ctcccagtac 


420 


gcgtgggtga 


acgagacggg 


caagaaagag 


480 


cgctccaagc 


tgggcactct 


cctgccagac 


540 


gccagtctgc 


cagaggagaa 


ggtctcgacc 


600 
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tttattatga 


tgtacagaac 


tcactgccag 


aggatactgg 


atactgtcat 


acgcgccaac 


660 


ttcgatgagg ttcagagctt cctgttgcac ttttggcagg gcatgccgcc ccacatgctc 


720 


cctgtcctgg gctcttctac 


agtggtcaac 


atagtgggtg tgtgtgactc 


catattgtac 


780 


aaggccatct 


caggcgtcct 


catgcccacc 


gtcctacaag 


ctctgcctga 


cagcctcact 


840 


caggtgatca 


ggaagtttgc 


caagcagctg 


gacgagtggc 


tgaaggtggc 


tttacatgac 


900 


ctgcccgaaa 


acctgcgcaa 


cattaagttt 


gaattgtcaa 


gaagattttc 


tcagattctc 


960 


aaacgacaaa 


catcattaaa 


ccacctctgt 


caggcctctc 


gaacagtgat 


ccacagtgca 


1020 


gacatcacct 


ttcagatgct 


cgaggactgg 


aggaacgtag 


acctcaacag 


catcactaaa 


1080 


caaactcttt 


atactatgga 


agactccaga 


gaagaccaga 


ggagactcat 


catccaattg 


1140 


tatcaagaat 


ttgacagact 


gctagaggac 


cagtctccaa 


ttgaagccta 


catcgagtgg 


1200 


ctggactcta 


tggtggagag 


atgtgttgtg 


agggtggcgg 


ggaagagacc 


cggatctctg 


1260 


aagagggtag 


ctcagcagtt 


cctgctcatg 


tggtcgtgtt 


ttgggacaag 


agttatccgg 


1320 


gatatgacgc 


tgcatagtgc 


accaagcttt 


ggctcgttcc 


atctgattca 


cctcatgttt 


1380- 


gatgactatg 


tactttacct 


gcttgaatct 


ctgcactgcc 


aagagagagc 


caatgaactg 


1440 


atgagggcga 


tgaaaggaga 


gggcgcacca 


gcagatactg 


gagaagagct 


gatgctgatg 


1500 


agctccactc 


caacatctac 


gtcacctgga 


ccctactctc 


ctgccaaatc 


tgttcactcg 


1560 


gtgggcgtac 


ccgcagtagg 


gtcccccaat 


tcagcccagt 


ctccggagta 


caccagcata 


1620 


tcggccacaa 


caggagctgt 


tcagtcatat 


acctggtccc 


ttacatacac 


agtgacaact 


1680 


tcaggcggca 


gcccaaccga 


gcccggatcc 


cagctgtcct 


gcatgagagg 


cggacctgcg 


1740 


ttacacggat 


catcctccgc 


acaccggatg 


ccagtttacc 


cacatcggga 


tgagcacggg 


1800 


tacactggca 


gctataatta 


cagcagctac 


gcaaaccagc 


accatcatgc 


cattcagagt 


1860 


caatactcca 


gtttaaccca 


tgaagcaggg 


ctgcccactc 


ctttgcatta 


ttcctcatac 


1920 


caccgcacct 


ccgcacagta 


tccgctcaac 


agtcaaatgt 


ccagaatgga 


gtcgtgtcta 


1980 


atgagcggct 


ctcctctcct 


acactccagt 


ccagtgaccc 


ctcgatggcc 


cgatgtgccc 


2040 


tctgccaaca 


gctgttactc 


cagtcccacc 


gtccacgcat 


cccgctactc 


caccggagac 


2100 


atgtactcgc 


cccttgcccc 


acgcaggaac 


tctgaatacg 


agcacgcaca 


acactttcca 


2160 


ggattcgcct 


atattaacgg 


ggaggccacg 


accggatggg 


caaaatga 
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